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PROCEEDINGS 
(Plaintiffs' Deposition Exhibit 3546 was 
marked for identification.) 

(Witness sworn.) 

DONALD B. RUBIN 

called as a witness, being first duly sworn, 
was examined and testified as follows: 

ADVERSE EXAMINATION 

BY MR. LOVE: 

Q. Professor Rubin, I just introduced myself to you 
a couple minutes ago but I'll repeat that I'm John 
Love, I'm with the law firm of Robins, Kaplan, Miller 
& Ciresi, and as you know, our firm represents the 
State of Minnesota, Blue Cross and Blue Shield of 
Minnesota in the tobacco litigation that's currently 
ongoing in Minnesota against various tobacco 
companies and industry groups. You are familiar with 
this litigation generally? 

A. Yes, I am. 

Q. And you were deposed earlier in this case? 

A. Yes. 

Q. October 6 and 7, I believe? 

A. The dates I don't remember exactly but it sounds 
about right. 

Q. You understand this is a continuation of your 
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1 deposition from before? 

2 A. Yes, I do. 

3 Q. And that all the rules about depositions that 

4 applied to that deposition apply to this deposition 

5 as well? 

6 A. Yes, I understand that. 

7 Q. And just as before, it's important for you to 

8 give an audible answer to all of my questions so the 

9 court reporter can take it down, nods of the head and 

10 so forth really he can't record accurately. 

11 A. I understand. 

12 Q. If I ask you any questions today you don't 

13 understand, please let me know and if you answer the 

14 question, I will assume that you understood the 

15 question. Is that fair enough? 

16 A. Yes, that is. 

17 Q. If you need to take a break at any point during 

18 the deposition today, just let me know and we will 

19 certainly accommodate you. 

20 A. Okay. Thank you. 

21 Q. Professor Rubin, I'll show you what we have 

22 marked as Plaintiffs' Exhibit 3546, titled 

23 Supplemental Expert Report of Donald P. Rubin, Ph.D., 

24 January 10, 1998. 

25 A. I hope it says B. Rubin. 
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1 

Q. 


I'm sorry, did I say P? 


2 

A. 


You did. 


3 

Q. 


It does say B. 


4 



MR. BIERSTEKER: It's a little early 

and 

5 

it 1 

' s 

Monday. Thanks. 


6 

Q. 


Professor Rubin, is that an accurate copy 

of 

7 

your 

supplemental expert report in this case? 


8 

A. 


Glancing through it, it looks that way. 


9 

Q. 


You don't recall there being any attachments ' 

10 

it 

or so on that aren't included there? 


11 

A. 


I'll take a slightly more careful look. 


12 

Q. 


Sure. 


13 

A. 


No, I believe this is everything. 


14 

Q. 


If you will excuse me one second. 


15 



(Discussion off the record.) 


16 

BY 

MR. LOVE: 


17 

Q. 


Professor Rubin, I'll show you what was 



18 previously marked in your earlier deposition as 

19 Plaintiffs' Exhibit 3540, now marked as trial Exhibit 

20 2273, Report of Professor Donald B. Rubin in the 

21 Minnesota Cigarette Litigation, July 19, 1997. 

22 A. Okay. 

23 Q. That's your original expert report; correct? 

24 A. It certainly looks that way, although I do think 

25 there were some attachments. There was an attachment 
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1 to this maybe that's not here. 

2 Q. Let me show you what was marked as Exhibit 3544 

3 to your first deposition, it's also marked trial 

4 Exhibit 2277, a memo to you from T. E. — 

5 A. Raghunathan. 

6 Q. Raghunathan. Is that what you were referring to 

7 that was attached? 

8 A. No. I believe in the original report there may 

9 have been a figure for the missing data part. 

10 Attached hereto as Exhibit 2 is a computer diskette 

11 containing a simulation — I guess an industry report 

12 was attached, it's a diskette. That's why it's not 

13 on paper. 

14 Q. All right. 

15 A. Okay. 

16 Q. I'll show you one more document that was marked 

17 as your original deposition, it was marked as exhibit 

18 — Plaintiffs' Exhibit 3545, it's also marked trial 

19 Exhibit 2278. Do you recall that from your original 

20 deposition? 

21 A. Yes, I do. Yeah. 

22 Q. Other than the computer diskette, are exhibits 

23 34 — 3545, 3544 and 3540 the work product that you 

24 produced prior to your supplemental report? 

25 A. Yes. 
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1 Q. And do these four exhibits that we have just 

2 identified contain all the expert opinions that you 

3 have reached in this case? 

4 A. I don't think so. There certainly are other 

5 opinions that I have that are probably not summarized 

6 in these documents. 

7 Q. All right. Please tell me what other opinions 

8 you have reached in this case as an expert. 

9 A. That's probably easier to come up with in 

10 response to specific questions. I guess my answer 

11 just is, in saying everything I've thought about in 

12 the case and all the documents I've read is not 

13 summarized under these few pages. 

14 (Interruption by the reporter.) 

15 A. Are not summarize under these few pages that are 

16 given here. If you are saying are there any other 

17 formal opinions that I have written down and 

18 distributed to people, I believe this is the 

19 collection I've written down. Certainly there are 

20 many more things in my head that I've thought about 

21 that aren't written down here, I believe. 

22 Q. Have you reached any other expert opinions that 

23 you intend to describe to the jury during the trial 

24 of this case if you are called as a witness? 

25 A. In response to questions, it's possible. If 
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1 somebody asks a question that involves something 

2 that's other than summarized here, then I would 

3 respond. 

4 Q. Can you give me any examples at all? 

5 A. Since the last document, which I think is the 

6 supplemental report, I certainly had been asked to 

7 read a variety of documents. Even in preparation for 

8 this deposition I think I was asked to read 10,000 

9 pages in one-day's notice, approximately, for this 

10 deposition. Obviously I didn't do all of that but 

11 there are things that I've been asked to read since 

12 January 10 and so I probably have opinions about 

13 those documents that aren't summarized in a document 

14 written before that date. 

15 Q. Other than your opinions about other documents 

16 that you have looked at, have you formed any expert 

17 opinions of your own besides those you have expressed 

18 in the four exhibits that we discussed already? 

19 A. I probably have formed opinions based on the 

20 readings that I've done since then. 

21 Q. Can you tell me what those opinions are? 

22 A. I can in — I can in response to specific 

23 questions. I don't come prepared with a list of all 

24 things that I've thought of since I've — while 

25 reading those other documents. 
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1 Q. Well Professor Rubin, I don't know what's in 

2 your head, obviously, so I can't ask you if X is in 

3 your head if there is unlimited number of Xs. 

4 A. That's right. That's my problem as well. 

5 Q. So I'm asking is there anything that you can 

6 recall going through your memory of opinions you have 

7 come to as an expert in this case that aren't 

8 expressed in these four documents. 

9 A. I can try to give an example. 

10 Q. You can give me as many examples as you can and 

11 if you say it can't be exhaustive, I understand 

12 that. 

13 A. It certainly cannot be exhaustive. 

14 Q. But as many examples as you can possibly give 

15 me. 

16 A. It might be helpful if I had a list of documents 

17 I was asked to read for this. 

18 Q. Sure. 

19 A. Then I can — that might help refresh my 

20 memory. One thing, I was asked to read some 

21 documents regarding imputations in NMES that were 

22 done when the Zeger team got the data set. 

23 (Interruption by the reporter.) 

24 A. The Zeger team did their imputations and I 

25 expressed some opinions, and I think in my first 
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1 report, about imputations that were done including 

2 ones done by Zeger, but I've read some documents 

3 since then, for example, that indicate that people, I 

4 think the article was by Sommers — 

5 Q. Right. 

6 A. — that certainly he was — indicated concern 

7 with the excessive amount of nonresponse in NMES 

8 within the agency, and so I wasn't as aware of what 

9 the agency's concerns were before reading that. 

10 Q. I understand you would like to see a list of the 

11 documents we designated for your deposition. That 

12 would help. 

13 A. That would help to at least get going on some 

14 kinds of comments, and so — 

15 MR. LOVE: Would you mark this as the next 

16 exhibit, then. 

17 (Plaintiffs' Deposition Exhibit 3547 was 

18 marked for identification.) 

19 BY MR. LOVE: 

20 Q. Before I show you that list of documents that we 

21 designated for your deposition, let me ask you, 

22 Professor Rubin, had you reached other expert 

23 opinions in this case besides what are expressed in 

24 the four documents we have discussed already, up to 

25 and including the time that you submitted your 
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1 supplemental expert report. Exhibit 3546? 

2 A. Let me be sure I understand the question. Have 

3 I — do I — had I formed other opinions prior to 

4 January 10 that are not expressed in these reports? 

5 Q. Right, January 10, 1998. 

6 A. I certainly believed the major opinions that I 

7 had formed prior to January 10 are represented in 

8 these reports. 

9 Q. By "major" do you mean any opinions of 

10 significance? 

11 A. Of most significance. But I believe if — Let 

12 me start that sentence again. 

13 When I read through my prior deposition in this 

14 case, there certainly were lots of questions asked 

15 and I provided answers. I think those answers 

16 expanded upon major points in these reports. My 

17 memory is that the deposition went for a couple days 

18 so there are obviously things raised in the 

19 deposition and opinions offered in the deposition 

20 that aren't strictly represented here. 

21 Q. Let's exclude opinions expressed in the four 

22 reports or during your previous deposition. Are 

23 there any other expert opinions that you have formed 

24 as of January 10, 1998? 

25 A. There may be, there may have been some that were 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



489 


1 not asked about in the deposition by Mr. Hamlin but I 

2 certainly couldn't — couldn't recover them now. 

3 Q. I will show you what we have marked as 

4 Plaintiffs' Exhibit 3547, the supplemental 

5 pre-designation of documents for your deposition 

6 today. Professor Rubin. 

7 A. Okay. 

8 Q. Let me go through those one at a time and see if 

9 that helps you recall any opinions that you have 

10 reached that aren't expressed in the documents, four 

11 exhibits we discussed earlier or your previous 

12 deposition. 

13 A. The first two are supplemental — sorry. Let me 

14 correct that. The first one is my supplementary 

15 report. The next two are supplemental reports of 

16 Brian McCall and William Wecker, which I had not seen 

17 before, and I did read through them so I have 

18 opinions about those documents and the issues they 

19 raise. The next one, two, three, four I believe are 

20 orders from the courts which I glanced through to get 

21 a feeling for what the underlying legal decisions had 

22 been regarding what's — what's allowable, although 

23 obviously, not being a lawyer, I read those with a 

24 lay eye, not an expert eye. 

25 The next one is a supplemental report of Zeger, 
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1 Wyant and Miller which I had seen before. The next 

2 listing is all demonstrative exhibits, which I had 

3 not seen. 

4 Q. All right. And I believe Mr. Biersteker told me 

5 there was one demonstrative exhibit — you may not 

6 know it by that name — that you were involved in 

7 preparing and we can discuss that later. 

8 A. That's the colorful one? 

9 Q. Yes. 

10 A. That I was involved in producing, and so 

11 actually — and that is an example of opinions that I 

12 would have now regarding the missing-data issues that 

13 I would not have been able — I would not have had 

14 with the same, hopeful, clarity I have now because 

15 it's so involved and intricate, and there is a 

16 summary now that I can refer to. 

17 The surgeon general's reports I did not look 

18 at. I may have looked at a page or two. The 

19 Evolving Role of Statistical Assessments as Evidence 

20 in the Courts, that's a book and I did look at the 

21 pages that were apparently referred to at trial. 

22 "Heart and Stroke Facts," I glanced at that. 

23 Sommers' study, that I did, I did look at that. 

24 I may have seen that — There are a couple Sommers 

25 articles and I don't have them delineated well enough 
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1 to remember on the imputation that was done in NMES 

2 by the agency. I did look at Milliman & Robertson, 

3 which is a — 

4 Q. Actuarial firm? 

5 A. Actuarial firm. Thank you. 

6 I did glance through the settlement agreement 

7 but again from a lay eye. 

8 I did quickly read through number 16, the 

9 Miller, Zhang, Novotny, Rice & Max report. 

10 I did glance at the Nurses Health Study, I'm 

11 aware of the Nurses Health Study as a separate study 

12 so I have a general understanding of what that study 

13 is. 

14 I guess in the interim also, in the context of 

15 reviewing reports for similar litigation in other 

16 states, I have read other documents that haven't been 

17 filed in this state but have been related to other 

18 states which have — which are attempting to do the 

19 same kinds of analyses, though not the same way 

20 necessarily, and so there is — there are opinions 

21 that I have about that work and about the — what 

22 they are trying to learn, what they are trying to do 

23 in those analyses that I believe are relevant. 

24 That's a broad brush of the documents that I've 

25 looked at and since January 10 I think that probably 
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1 includes most of them. This last comment regarding 

2 other cases, there are lots of documents in those 

3 other cases but I would have opinions about those 

4 documents, I have opinions about issues raised in 

5 those documents, and some of those opinions would at 

6 least have bearing on my opinions on what's being 

7 done here. 

8 Q. You mentioned that you read documents in other 

9 state cases other than the Minnesota case? 

10 A. Correct. 

11 Q. Which — You have some opinions having reviewed 

12 those documents that might apply to the Minnesota 

13 case as well; correct? 

14 A. Correct. 

15 Q. Please tell me about those opinions. 

16 A. The — Let me just ask Peter, this is — there 

17 is no problem talking about this, Peter? 

18 MR. BIERSTEKER: No, you can go ahead and 

19 start, although obviously there is going to be a 

20 certain level of detail beyond which — we haven't 

21 made designations in all cases yet so in some ways 

22 this is consulting as opposed to testifying, although 

23 I anticipate Professor Rubin will be eventually a 

24 testifying expert. So we will let you talk about 

25 some of this stuff, but I don't want to get too far 
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1 into it. 

2 THE WITNESS: So you can't look out the 

3 window. 

4 MR. BIERSTEKER: I won't look out the 

5 window. 

6 A. The two cases I've been looking at most recently 

7 are Oklahoma and Washington State, and in Washington 

8 State the primary document that I've been looking at, 

9 there have been a lot of supporting documents as 

10 well, is the report by — reports by Jeffrey Harris, 

11 and that's — that report is of interest because he 

12 explicitly tries to address the misconduct issue and 

13 explicitly tries to estimate how behavior and medical 

14 expenses would have been different had the alleged 

15 misconduct not occurred, alleged misconduct by the 

16 tobacco industry. 

17 Related to that, there were a collection of 

18 documents that he referred to including a book by 

19 Manning dot dot Newhouse dot dot dot that I had never 

20 looked at before, and I have glanced at that, read 

21 parts of that as well, and there are other documents 

22 in addition that Jeffrey Harris referred to that are 

23 — that I do not believe were designated out of that 

24 — I don't know the technical meaning, I don't mean 

25 it that way, but they didn't arise. This is my 
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1 reading of documents for Minnesota. In fact, there 

2 is one that comes to mind now, there is a new report 

3 by I think it was Miller in November of '97 — I 

4 think that's the date — that I hadn't seen before, 

5 in which, for example, he refers to the missing-data 

6 problem in a different way than he did earlier. 

7 Q. This is by Leonard Miller or Vincent Miller? 

8 A. I'm blocking. 

9 MR. BIERSTEKER: Actually, I expect if it's 

10 a state other than Minnesota it would be Vincent, but 

11 was it published or a litigation-type document? 

12 THE WITNESS: I remember the date of 

13 November of '97 and I do not believe it's published. 

14 MR. BIERSTEKER: It's probably Vincent. 

15 A. There are other documents like that as well that 

16 Harris referred to that I was asked to look over. In 

17 the Oklahoma case, the main — 

18 Q. Let me just — 

19 A. I'm sorry. 

20 Q. Stop with the Washington case for a minute. 

21 A. Sure. 

22 Q. Are there any opinions that you formed in 

23 connection with your review of these documents in the 

24 Washington case that you believe apply to the 

25 Minnesota case? 
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1 A. Yes. 

2 Q. And what are those? 

3 A. Well the primary one has to do with the 

4 feasibility of attempting to do an analysis that 

5 addresses the alleged misconduct and the changes in 

6 behavior and healthcare costs in time that may have 

7 resulted from that alleged misconduct. Harris 

8 directly addresses it. Not necessarily the way I 

9 would do it, I'm not saying that. I think there are 

10 problems there. But he does address that issue in a 

11 serious way, so that indicates that — I think it can 

12 be addressed. Again I'm not endorsing everything he 

13 did because I think there are serious problems there, 

14 but it is a — it is a serious first-pass attempt to 

15 address the question in — that has longitudinal 

16 aspects in time, addresses misconduct and attempts to 

17 address the issue of the effect of misconduct on both 

18 behavior and healthcare costs. Also the fact that in 

19 the other report by Miller, I guess it's Vince, that 

20 he — he was a little bit more aware of problems with 

21 missing data than he was before, so he refers, for 

22 example, to my '87 textbook with Rod Little, and 

23 there exists more sophisticated methods to handle 

24 missing data, although he doesn't attempt to use it, 

25 but at least he is up one notch from where he was 
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1 before in awareness. 

2 Q. When he says, refers to, his '87 textbook or 

3 your — 

4 A. My '87 textbook. 

5 Q. Rubin and Little? 

6 A. Yeah, Little and Rubin. 

7 Q. Little and Rubin. 

8 A. Right. 

9 Q. What about the discussion of missing data do you 

10 — led you to form any opinions that applied in the 

11 Minnesota case other than those expressed in the four 

12 reports you previously looked or your deposition in 

13 this case? 

14 A. I'm not sure I understood that question. 

15 Q. Sure. What about your review of this discussion 

16 of missing data in the Washington case that led you 

17 to form opinions that apply to the Minnesota case 

18 other than those you expressed in the previous 

19 reports that we discussed and at your previous 

20 deposition here in the Minnesota case? 

21 A. Well previously it appeared to me that the 

22 methods of handling missing data were extremely naive 

23 relative to the literature and work that's been done 

24 in the past two decades on handling missing data 

25 problems in general and specifically in government 
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1 survey type data sets. The fact that Miller referred 

2 to this book, which I think still is regarded as 

3 being the standard reference on missing data even 

4 though it's more than a decade old, indicates to me 

5 that — excuse me — that there was some awareness 

6 that there is a literature that's out there but they 

7 weren't really — he wasn't, not "they," he wasn't 

8 really prepared to try to capitalize on that 

9 literature because it was buried in a paragraph that 

10 said yes, there are more sophisticated methods, or 

11 sentence to that effect, advanced methods or other 

12 methods, but, his assessment, they weren't worth 

13 doing here. So the reference to the book was just 

14 the reference to the whole book, not even a chapter 

15 or a section, so it was the kind of thing, well, gee, 

16 I guess I wasn't — my reading might have been I 

17 guess I didn't know that there was — that there was 

18 this literature, I should probably refer to it, but I 

19 guess I don't want — it's too hard to do the work so 

20 I'll go back and do what I did before. That's an 

21 opinion about what they — what the level of 

22 understanding of how to deal with missing data was in 

23 dealing with the NMES data set. That's kind of a 

24 bungled sentence but I'm trying to find something — 

25 I'm trying to be very clear on what effect it has on 
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1 my opinions about Minnesota. 

2 Q. Right. 

3 A. And I know I was — 

4 Q. That's what I'm after, what effect does it have 

5 on your opinion in the Minnesota case? 

6 A. That's a fair question. I realize I was sort of 

7 drifting into another direction, that's why I said 

8 "bungled sentence." 

9 Q. Let me ask it so it's clear. 

10 A. Yeah. 

11 Q. What effect did reviewing this, some further 

12 discussion by an expert named Miller in the 

13 Washington case, have in terms of giving you any new 

14 opinions about the missing-data issue in the 

15 Minnesota case? 

16 A. None with respect to what actually was done but 

17 I guess what I'm saying, the way they referred to the 

18 book, the way Miller referred to the book made me 

19 reinforce the idea when they handled — when they, 

20 meaning the people handling missing data in all these 

21 — in NMES missing data in all the states, my 

22 feeling before was they just weren't aware of the 

23 issue of the literature on this very important issue 

24 and this — this quotation in the Miller book 

25 reinforced that feeling that they weren't aware of it 
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1 at all, and it's oh well, we'll just go forward, so 

2 it just reinforced that feeling, that's all. 

3 Q. If you look at your supplemental report. Exhibit 

4 3546, and turn to page 8. 

5 A. Yes. 

6 Q. Is that where you began your discussion of what 

7 you call the causal question? 

8 A. Correct. I believe that's right, yeah. That 

9 certainly — that's what the section is titled there, 

10 although I do have some discussion of that earlier. 

11 This is a — this is the supplemental report. 

12 Q. Right. 

13 A. The propensity score methods, just to be sure, 

14 the propensity score method, which are section I, are 

15 related to the causal question, too, in the sense 

16 that if you want to address causal questions and 

17 observation studies, you do have to worry about 

18 adjusting for collection of background variables, 

19 something called confounders, whose distributions 

20 differ in the exposed and unexposed groups, in this 

21 case smokers and nonsmokers, and these propensity 

22 score methods are designed to help that process of 

23 understanding whether you can do that adjustment 

24 realistically and how to do that adjustment through 

25 background variables, these confounders. So, section 
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1 I is about methods that are closely related to causal 

2 questions. The section II, called Data Sets for 

3 Causal Modeling, are about data sets that maybe have 

4 something to say about — about the topic of 

5 behavioral changes and healthcare changes in time 

6 after the date of alleged misconduct. So I want to 

7 be clear that section I, although it's entitled 

8 Propensity Score Methods, it's related to causal 

9 questions as well. 

10 Q. Is section I relevant to estimating 

11 smoking-attributable healthcare expenditures leaving 

12 aside the causal question relating to defendants' 

13 alleged misconduct? 

14 A. Yes. 

15 Q. So it relates to both types — 

16 A. Correct. 

17 Q. — type calculations or estimations? 

18 A. Correct. 

19 Q. Other than issues relating to what you call the 

20 causal question that we just discussed, illustrated 

21 to some extent in section II of your supplemental 

22 report, and other than what you just told me about 

23 the missing-data discussion in the Washington case, 

24 is there anything else about your review of documents 

25 in the Washington case that led you to new opinions 
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1 about the Minnesota case? 

2 A. Let me think for a minute. 

3 Certainly the primary one would be this temporal 

4 analysis involving the date of alleged misconduct and 

5 what would have happened otherwise. There may be 

6 other things but they certainly are not as dominant 

7 as that one. 

8 Q. My question is: Can you recall any other 

9 opinions that you believe, new opinions that would 

10 apply to the Minnesota case other than that causal 

11 opinion you just told me about? 

12 A. And the very minor point about the missing 

13 data. 

14 Q. Right. 

15 A. From Washington. 

16 The reason I'm pausing is I'm trying to think, 

17 when I glanced through, read parts of the Manning dot 

18 dot dot Newhouse dot dot dot book, whether there were 

19 things there. There were these issues of — of — 

20 it's a very economically oriented book about costs 

21 and discounting and I don't believe they really would 

22 change my opinions of anything in this case that I've 

23 talked about but it's possible if I thought harder 

24 about that something might arise, but nothing else 

25 comes to mind right now. 
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1 Q. We now turn to the Oklahoma case. 

2 A. Okay. 

3 Q. What new opinions that aren't expressed in the 

4 various reports you previously submitted here in the 

5 Minnesota case have you arrived at based on your work 

6 and review of documents in the Oklahoma case? 

7 MR. BIERSTEKER: Insofar as those opinions 

8 relate to Minnesota; right? 

9 MR. LOVE: Yes. 

10 MR. BIERSTEKER: Yes. 

11 A. I haven't spent as much time looking at 

12 documents for Oklahoma as I have for Washington, and 

13 the Ok — the main report there is by a fellow named 

14 Harrison. I don't remember his first name. He is an 

15 economist in South Carolina. And that analysis, I 

16 believe this is correct, does not attempt to do the 

17 misconduct, did not try to address the misconduct 

18 issue, and it's styled more like the Minnesota 

19 analysis starting with NMES and doing the adjustments 

20 for the background variables. 

21 One thing it does do that may have an effect on 

22 my opinions in Minnesota is my memory is that 

23 Harrison attempts to adjust for a larger collection 

24 of background variables than do the Zeger team. My 

25 memory is it's something on the order of — I don't 
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1 know why this number comes to mind, but 38 background 

2 variables that I believe he regards as important to 

3 adjust for and in a later section of his report I 

4 think he does a bunch of tests of various kinds, I'm 

5 not saying I endorse these tests, but that he does 

6 tests to check the validity of his model and 

7 indicates that he believes that the model is a good 

8 model and these factors are needed, they are needed 

9 in order to get the right kinds of predictions. I've 

10 not carefully considered that, I haven't looked at 

11 the data at all, I haven't had — looked at the files 

12 to see what I think of these variables, but the fact 

13 he is using a larger set of variables I believe and 

14 claiming that it's important to include all these 

15 variables might have an effect on my opinion in this 

16 case and the fact that a smaller set of variables is 

17 in fact used. 

18 Q. Are there any particular variables that you 

19 believe weren't used in the Minnesota work by 

20 professors — Drs. Zeger, Wyatt and Miller that 

21 should have been used? 

22 A. I'm trying to remember. I don't have a list of 

23 the Harrison report in mind. It may have included 

24 variables like depression or other sorts of 

25 behavioral variables. I wish now I had looked over 
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1 that before coming here today and I did not. 

2 Q. Do you have an expert opinion that depression is 

3 a variable that should have been used in the 

4 Minnesota analysis? 

5 A. I do not have an expert opinion on which — 

6 which variables should have been — should have been 

7 used in the analysis the way it was done. That 

8 caveat is important to me. 

9 Q. What do you mean by "in the way in which it was 

10 done"? 

11 A. Well there is an attempt to describe smoking — 

12 sorry. There is an attempt to estimate this quantity 

13 called smoking-attributable fraction or smoking- 

14 attributable expenditures as a function of certain 

15 collection of background characteristics and the way 

16 these analyses are set up, this thing called 

17 smoking-attributable fraction or smoking-attributable 

18 expenditure is a description, what I call descriptive 

19 estimate, something you are trying to estimate that 

20 describes a population within certain cells defined 

21 by these background characteristics, and it's an 

22 attempt to compare like with like, as the phrase 

23 goes, to compare smokers and nonsmokers who are alike 

24 with respect to these background characteristics, and 

25 therefore it is a descriptive quantity to be 
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1 estimated in the population with these cells defined 

2 by these background characteristics, and the list of 

3 background characteristics to be used in getting a 

4 smoking-attributable expenditure or 

5 smoking-attributable fraction that lists, then 

6 defines what this descriptive estimate is supposed to 

7 be, so that's one segment, that's a descriptive 

8 estimate. 

9 Now in order to think of that descriptive 

10 estimate as having any causal interest, any causal 

11 interpretation either in the sense of misconduct or 

12 in the sense of if there were never any smoking, 

13 smoking did not exist, requires a whole collection of 

14 other assumptions to be brought on top of it. So 

15 that's the sense in which I'm saying that the — that 

16 the list of those background variables to control for 

17 each list that you make up, there is a descriptive 

18 quantity and to — let me stop there. I'll let you 

19 ask a question. Sorry. 

20 Q. So to — if you are trying to estimate 

21 smoking-attributable expenditures in this descriptive 

22 way that you have described, you just described, — 

23 A. Right. 

24 Q. — and you don't deal with alleged misconduct of 

25 the tobacco defendants and you don't look at a world 
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1 in which no one ever smoked but instead you compare 

2 smokers with similar never smokers, I take it you 

3 don't have any expert opinion of what variable should 

4 be used in that calculation? 

5 A. I'm not sure which calculation — you mean the 

6 descriptive, descriptive calculation. The — Each 

7 descriptive calculation is another description. We 

8 can just compare smokers and nonsmokers or you can 

9 compare smokers and nonsmokers who are male or you 

10 can compare smokers and nonsmokers who are male and 

11 white or compare smokers and nonsmokers who are male, 

12 white and 34 to 65 and who are depressed and who 

13 drive recklessly, and to — so all those things are 

14 descriptive quantities and if the — if the target of 

15 estimation is a descriptive quantity like that, then 

16 given that it's a descriptive quantity, my job as a 

17 statistician is to say, okay, how do I try to 

18 estimate that quantity. Well given the data that I 

19 have at hand at this point, there are some 

20 descriptive quantities that are being set up to 

21 estimate the control for more or less of these 

22 background characteristics, so my job as a 

23 statistician is to say how do you estimate those 

24 quantities well, and the Harrison report in Oklahoma, 

25 for example, his analysis adjusts for a bigger 
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1 collection of background characteristics and so that 

2 would say that there are more precise in the sense of 

3 more precise for each person descriptive demands that 

4 some people think should be estimated than were 

5 estimated in Minnesota. 

6 Q. My question is whether you. Professor Rubin, 

7 have an expert opinion as to whether more of those 

8 background variables should have been used in the 

9 Minnesota calculation to do what it purports to do. 

10 A. Should have in the sense of getting more precise 

11 comparisons in the sense of more — more conditional, 

12 more conditioning on more things and therefore more 

13 precisely relevant to a particular person or 

14 particular class of people, the bigger the collection 

15 the better. In a descriptive sense, it's more 

16 precise, so that's — so that's one answer. 

17 Q. That's one answer. But if you are trying to 

18 estimate the total cost spent, let's say, by the 

19 state of Minnesota through its Medicaid program 

20 during a certain time period, in a descriptive way 

21 that Dr. Zeger, Wyant and Miller did, do you have an 

22 expert opinion that there are other background 

23 variables they should have used that they didn't use? 

24 A. Well the descriptive question is a descriptive 

25 question and you get to decide which background you 
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1 wish to condition on, so the opinion there would be 

2 if you want more precise comparisons in the sense of 

3 being more specific to an individual person of 

4 certain characteristics, then the bigger the set the 

5 better. 

6 Q. My question is: If you are not trying to 

7 estimate for a particular people but you want to know 

8 if the state of Minnesota's Medicaid program, for 

9 instance, what the total expenditure was over a 18- 

10 or 19-year period, do you believe they should have 

11 had more background variables that they considered in 

12 their analysis? 

13 A. Yes, if the objective is to compare like with 

14 like, which is what I keep saying. I keep reading, 

15 not I keep saying. I apologize. I keep reading in 

16 the descriptions they want to compare like with like 

17 and in the standard epidemiological approach, that's 

18 what's to be the objective. And even if just total 

19 costs at the end are desired totaling over the whole 

20 state, if you want to make those comparisons 

21 comparing smokers and nonsmokers who are as similar 

22 as possible with respect to the background 

23 characteristics, the phrase like with like, then you 

24 should include as many as possible to do that 

25 comparison. 
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1 Q. Would you agree that the final result will 

2 change less and less as the additional factors or 

3 variables are taken into consideration, have less to 

4 do with the relationship between smoking and 

5 healthcare expenditures? 

6 MR. BIERSTEKER: Just a moment. I'm going 

7 to object to the form. I'm not sure I understand the 

8 question but if you understand it, you can answer, 

9 Professor. 

10 A. I think you stated a tautology which if 

11 something is irrelevant, then when you adjust for it 

12 it will be irrelevant, but you may have meant if it 

13 appears to be, based on some analysis, unrelated, 

14 does that mean that properly adjusting for it would 

15 have no effect, and that's wrong. In other words, 

16 one of the problems I see in the analyses that have 

17 been done is the adjustments that are being done are 

18 not being done in a reliable way and if things appear 

19 to be relatively unrelated they may in fact not be, 

20 and so if the adjustments aren't being done 

21 correctly, they may — doing it correctly may have a 

22 big effect. 

23 Q. Professor Rubin, in your opinion, to do the best 

24 practical job in estimating the smoking-attributable 

25 expenditures in the ways that Drs. Zeger, Wyant and 
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1 Miller did, what additional background variables 

2 would you suggest be taken into consideration be 

3 used? 

4 A. Well again the — in the descriptive sense, if 

5 you want to compare like smokers and nonsmokers, you 

6 should have as long a list as you can, and the list 

7 that's available certainly includes more variables 

8 than had been included in the Zeger et al analysis. 

9 Moreover, the way one then does the comparison of the 

10 smoker-nonsmoker, conditionally given this list of 

11 background characteristics, has to be done more 

12 carefully, I believe, than was done by the Zeger 

13 team. 

14 Q. Is it your opinion that they should have used 

15 every variable that was available in NMES? 

16 MR. BIERSTEKER: I'll object. That's asked 

17 and answered. 

18 A. I believe I have answered that before in the 

19 sense that the more variables that you include the 

20 more specific an answer you get, specific to a 

21 particular type of person, and therefore the more you 

22 have a descriptive answer that is satisfying this 

23 desire to compare like smoker and like nonsmoker 

24 where like is defined by this collection of 

25 background variables. 
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1 Q. If you were given the task to estimate what Dr. 

2 Zeger, Wyatt and Miller called smoking-attributable 

3 expenditure in the state of Minnesota's Medicaid 

4 program, what additional background variables would 

5 you put into your model? 

6 A. Well I would probably want to discuss the issue 

7 with epidemiologists, perhaps medical people, perhaps 

8 economists to find out at what level of specificity 

9 they wanted this descriptive S demand to be, and then 

10 I would work with them towards getting good estimates 

11 of this — of this quantity, smoking-attributable 

12 fraction or smoking-attributable expenditure at that 

13 level of specificity. 

14 Q. Have you had such conversations with other 

15 experts, health economists and so on? 

16 A. I'm thinking. It's possible. I don't think 

17 I've had specific conversations about that topic, no, 

18 but I certainly have read a variety of these reports 

19 that address the issue of what those background 

20 characteristics should be and that's the context in 

21 which the Harrison report, because it has a longer 

22 list, arose. So that somebody he talked to who is an 

23 economist, perhaps by himself or people he talked to 

24 in the Oklahoma case thought that the list should 

25 include more variables than were included in 
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1 Minnesota. 

2 Q. Based on what you have read so far and what you 

3 know so far, if you were given the task today, are 

4 there any additional variables that you would take 

5 into consideration? 

6 A. So could you set the stage a little bit better? 

7 It's a hypothetical situation. 

8 Q. Sure. 

9 A. I'm being retained by the state of Minnesota to 

10 redo the Zeger analysis, is that it? 

11 Q. You are being retained to estimate 

12 smoking-attributable expenditures the way Zeger, 

13 Wyant and Miller do for the Medicaid program in the 

14 state of Minnesota over a period of time, several 

15 years, and my question is: Based on what you have 

16 reviewed today and so forth, are there any other 

17 variables that you would put in your model that they 

18 did not use in their model? 

19 MR. BIERSTEKER: I'll object to form. 

20 A. Well I guess I would say I would look around at 

21 other analyses that people had done like the Harrison 

22 one. Maybe the Harrison one, although I guess he 

23 refers to documents done by other people, and would 

24 look to see the collection of variables that other 

25 people think are worthwhile adjusting for, and then I 
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1 would say, okay, let's try to adjust for the biggest 

2 collection we can, which includes everything that 

3 someone might think is important, and if someone says 

4 no, I want to back off that, like back off using 

5 depression and get this descriptive smoking- 

6 attributable expenditure, smoking-attributable 

7 fraction, not adjusting for depression, then that's 

8 easy to do having already done it for depression, 

9 it's easy to go the other way from an intellectual 

10 sense, at least. What I would do then, I would 

11 include at the beginning an attempt to adjust for any 

12 of the background variables that anyone ever thought 

13 were relevant to see what would happen and then when 

14 someone says, okay, I want to leave that one out, 

15 then we could back off and leave that one out. 

16 Having done the harder task first, it's easier to 

17 back off, so I guess I would adjust for anything. I 

18 would do it in a model at least where I would get 

19 those quantities at a more precise level, more 

20 specific level than they did because other people 

21 think it's worthwhile doing. And just as in the — 

22 as in the final sort of analyses that are done, there 

23 are many more variables in NMES that are used to get 

24 the smoking-attributable, refined analysis at least, 

25 that are used to get smoking-attributable 
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1 expenditures, smoking-attributable fractions than are 

2 used in the summary tables where they just do it by 

3 age and sex, for example. 

4 So they do it — they — they essentially do the 

5 same thing that I'm saying I would do; that is, they 

6 adjust for a bigger collection and they report at a 

7 more accumulated level, more summary level, and I'm 

8 saying let's do the analysis, then, and if other 

9 people think you should adjust for background 

10 variables, let's try to adjust with a full set and 

11 then accumulate the results, collapse the results if 

12 desired. 

13 Q. And sitting here today, based on whatever 

14 conversations you have had with other people, 

15 whatever, you have been able to read from other 

16 experts in other fields, are there any particular 

17 variables you can tell me you would include in your 

18 model? 

19 A. I think I've answered that, that I would try to 

20 include all the variables that anyone thinks are 

21 important. 


22 

Q. 

I'm asking — 


23 

A. 

Specifically. 


24 

Q. 

— identify those 

variables for me. 

25 

A. 

Okay. Let's see. 

I think depression is one 
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1 that people had mentioned, maybe alcohol consumption 

2 is another one that people have mentioned. I think 

3 there are other indicators of risk-taking behavior 

4 that people have mentioned that lead to other kinds 

5 of medical costs. I haven't looked at the Harrison 

6 report for weeks so I wish I had it in front of me, 

7 which I do not. 

8 Q. That's what you can recall today? 

9 A. Pardon? 

10 Q. What's what you can recall today? 

11 A. That's what I recall today at this moment. 

12 During the day, lunch, other things may arise, and so 

13 — but sitting here right at this moment, those are 

14 the ones that pop to mind. 

15 Q. If other things arise, please let me know. 

16 A. Sure, I'll make a note of that. I suspect they 

17 will because it's the kind of thing you can't recall 

18 it when asked the — can't recall when asked the 

19 question directly, but they are in the back of your 

20 mind so they pop up. 

21 Q. Other than adjusting for other background 

22 variables, is there any other opinions that you now 

23 have in the Minnesota case that you arrived at while 

24 reviewing information in working on the Oklahoma 

25 case? 
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1 A. There was an issue in the Oklahoma case — I 

2 wish I could remember this again more clearly — of 

3 — on what population or subpopulation the 

4 estimation was done, and there was an issue of how 

5 you dealt with the Medicaid population. I'll try to 

6 get to that, and I — I'm a bit fuzzy about it now, 

7 but I believe it was different than was done in 

8 Minnesota and in order to recover that I'd have to 

9 either think harder or go find the Harrison report. 

10 But I believe there was an issue of how it dealt with 

11 the subpopulation of public-aid recipients from NMES 

12 that was done differently than the way it's done in 

13 Minnesota. I didn't review that before today because 

14 it's — I've wanted to keep clear on what was being 

15 done here and keep the other stuff out of my mind but 

16 perhaps I should have glanced through that. 

17 Q. I just want to know what opinion you have about 

18 the Minnesota case now. 

19 A. Yeah, I understand that. 

20 Q. If you — 

21 A. But you asked the question based on what I read 

22 in Oklahoma. 

23 Q. Right. And so I mean do you have a different 

24 opinion about the Minnesota case having looked at 

25 this issue maybe of some populations in NMES? 
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1 A. It's — it's possible because I think the 

2 estimation was done, the estimation in NMES was done 

3 not only using more variables but I think with a 

4 different way of dealing with the public aid Medicaid 

5 subpopulations in NMES. I just don't have it clearly 

6 in mind. I apologize. 

IQ. I take it you have don't have a clear opinion on 

8 the Minnesota case in mind on that issue right now. 

9 A. I guess the clear opinion would be there is an 

10 issue to be thought about and perhaps the analysis 

11 that was done in Minnesota, since my memory is that 

12 it differs from the analysis that was done my 

13 Harrison for Oklahoma, perhaps the Minnesota analysis 

14 is less good, less relevant than the Oklahoma 

15 analysis. In order to decide that, I have to become 

16 much clearer about the way the Oklahoma analysis was 

17 done and if in fact it does deal with that 

18 subpopulation issue differently, which I believe it 

19 does. 

20 Q. Anything else, any other opinions about the 

21 Minnesota case that you arrived at as a result of 

22 your work and review of information in the Oklahoma 

23 case? 

24 A. Well one goes back to the missing-data problem. 

25 I don't know if this really affects my opinion to 
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1 handle what was done in handling missing data in 

2 Minnesota, but Harrison tries to avoid any imputation 

3 at all because he states, I believe, somewhere in the 

4 report that it's subject to dispute and he doesn't 

5 want to get into the kinds of problems, he views I 

6 guess as problems, that, for example, arise in 

7 Minnesota. He does some recoding of variables which 

8 he thinks gets around the missing-data problem and 

9 then discards subjects that have missing data after 

10 he has done this arbitrary coding, and so-called 

11 complete case analysis, although it really isn't. It 

12 has problems. 

13 His analysis has problems but in some ways they 

14 are different problems from the way they are handling 

15 missing data in Minnesota. Does that affect my 

16 opinion of what was done in Minnesota? It doesn't 

17 affect my opinion of what was done. I've always had 

18 the opinions that have been written down there on the 

19 way they handled missing data which I'm quite 

20 critical of, but it — the report from Harrison 

21 suggests that other people are now aware of the 

22 problems as well since he is trying to avoid getting 

23 in the same predicament. 

24 Q. Looking back at the pre-designation of documents 

25 for this deposition. Professor Rubin, Exhibit 3547, 
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1 item number 2 was the supplemental expert report of 

2 Professor McCall. 

3 A. Right. 

4 Q. I believe you told me you have reviewed that 

5 now. 

6 A. Yes, I have. 

7 Q. Have you reviewed Professor McCall's original 

8 expert report? 

9 A. No, I have not. 

10 Q. All right. 

11 A. It was — But I did review the supplement and so 

12 I don't have a good tie between certain parts of the 

13 supplement and the points made in the original 

14 report. 

15 Q. I understand. When did you first review 

16 Professor McCall's supplemental report? 

17 A. I believe that was on Saturday, just this past 

18 weekend. 

19 Q. How much time did you spend reviewing it? 

20 A. I — Saturday and yesterday, maybe an hour in 

21 total, although it's a short document. The way it's 

22 tied, it's hard to make contact for me for some 

23 reason, so I probably spent about an hour. 

24 Q. Item number 3 is the supplemental report of 

25 William E. Wecker dated January 15, 1998. 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



520 


1 A. Right. 

2 Q. You saw that as well? 

3 A. Yes. 

4 Q. Did you see Dr. Wecker's original expert report? 

5 A. No, I did not. 

6 Q. You have never seen that? 

7 A. I've never seen it. 

8 Q. When did you first see Dr. Wecker's supplemental 

9 report? 

10 A. Same time as the Brian McCall, Saturday. 

11 (Interruption by the reporter.) 

12 Q. And Sunday, too, I take it? 

13 A. Yes. 

14 Q. This past weekend? 

15 A. Correct. 

16 Q. How much time did you spend reviewing Dr. 

17 Wecker's supplemental report? 

18 A. Probably about the same amount of time. In some 

19 sense, it might be more time, it might be less, 

20 depending on how you count time. What I mean by that 

21 is, both documents are very short but then the time 

22 spent looking at that and trying to look at other 

23 materials that tied into that and thinking about how 

24 it relates to other things, going back and looking, 

25 for example, at the supplemental report of the Zeger 
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1 team, the original report of Zeger team, so it's 

2 probably more time total in — where that was the 

3 stimulus to think about things. That sentence makes 

4 sense, sort of. 

5 MR. LOVE: I'll mark this next. 

6 (Plaintiffs' Deposition Exhibit 3548 was 

7 marked for identification.) 

8 BY MR. LOVE: 

9 Q. Professor Rubin, I'll show you what we have 

10 marked as Exhibit 3548 to your deposition. It's two 

11 — one figure but takes up two pages, and then three 

12 pages of footnotes that go with it; is that correct? 

13 A. That is correct. There is a whole collection of 

14 footnotes going up to whatever letter it is, U. 

15 Q. From A through U. 

16 A. Yeah, yes. 

17 Q. And I understand from Mr. Biersteker that this 

18 is the only demonstrative exhibit in the Minnesota 

19 case that you were involved with. 

20 A. Correct. 

21 MR. LOVE: Is that correct, Peter? 

22 MR. BIERSTEKER: I suppose — I believe 

23 that's correct. Let me say this: The simulation 

24 that was the electronic media attached to Professor 

25 Rubin's first report does generate graphical displays 
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1 that I suppose you could count as a demonstrative, 

2 although I'm not sure they were designated as such. 

3 Okay? That's my own — that's my only hesitation. 

4 MR. LOVE: To the best of your 

5 recollection, then, the ones you demonstrated as 


6 

demonstrative 

is the 

only 

one he 

was involved 

with 

7 

MR. 

BIERSTEKER: 

That's 

correct. 


8 

MR. 

LOVE: 

What 

we have 

just marked 

as 


9 3548? 

10 MR. BIERSTEKER: I believe that's correct. 

11 Q. With that understanding. Professor Rubin, I'm 

12 not going to show you hundreds of demonstrative 

13 exhibits and ask you if you were involved in them but 

14 I will talk to you about this one that you were 

15 involved in. 


16 

A. 

Okay, 

fine. 

17 

Q. 

First 

of all, let me confirm you were involved 


18 in preparing this exhibit, 3548? 

19 A. Absolutely. 

20 Q. And the two-page figure, is it meant to be laid 

21 side by side or how is it used? 

22 A. Yes. The first one goes to the left and the 

23 second one goes to the right, so you will notice that 

24 they are lined up so the data sources, which is the 

25 first column labled data sources, each is the data 
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1 source that was used in some of the Zeger et al 

2 analyses and so they run across the row, so NMES with 

3 the supplement runs across. That's the first row. 

4 The second row is the NMES without supplement, goes 

5 all the way across. Each column, that's after the 

6 data source column, which is the title, refers to 

7 some subset of variables that are being used in some 

8 analyses somewhere by the Zeger team, so each column 

9 refers to a block of variables, or maybe only one 

10 variable. Like the second column is labeled race, 

11 that refers to the variable race, which is available 

12 in all the data sets. 

13 Q. Okay. 

14 A. And whereas something like smoking status, which 

15 is also of — a variable, actually a small collection 

16 of variables, is available in NMES but not in the 

17 supplement. It is available in BRFSS but not 

18 available in billing records, et cetera. 

19 Q. Okay. I realize that the top of this Figure 1 

20 describes to some extent what the chart does but I 

21 want to make sure I have it straight in my mind. If 

22 a box in this chart is white, what does that mean? 

23 A. That means that there are data available there 

24 in that — the data source, which is the column — 

25 look at age and gender to be clear. 
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1 Q. Sure. 

2 A. The first upper left corner, that's the NMES 

3 with the supplement and look at age and gender, they 

4 are almost always fully observed. That's why it's 

5 blank. Then the next one is race and that's blank, 

6 that means it's almost always fully observed. That's 

7 why it's blank. 

8 Q. The next one says "Smoking Status" and there is 

9 a "5% or 9%" listed above that. 

10 A. Right. I believe footnote A explains that, I 

11 hope it does. It has to do with the way they do the 

12 coding. Footnote explains that and says, "The 

13 percentage of missing values depends upon the type of 

14 analysis," is one of the things awkward about the 

15 analysis they are doing, the analyses tends to be 

16 moving targets. "It is 5% when smoking status is 

17 coded as Ever smoker vs Never smoker and it is 9% 

18 when smoking status is coded as Current, Former or 

19 Never smoker. Subjects with missing smoking data are 

20 treated as former smokers by the plaintiffs," so 

21 that's one of these what I would regard as an 

22 arbitrary imputation. 

23 Q. All right. Then the next column in that same 

24 first row is blue for "Physician Diagnosis of Medical 

25 Condition." What does that mean? 
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1 A. That means do we have a variable that describes 

2 from the physician the diagnosis of the person's 

3 medical condition, and that's not in NMES. We don't 

4 have that. We have self-report, which is two columns 

5 over. 

6 Q. And blue means there is no data at all? 

7 A. There is no data, right, there is in physician. 

8 Where they did NMES, they didn't say get a report 

9 from your doctor. Xerox it and mail it in and we will 

10 code it in NMES. They didn't do that. Nor did they 

11 get a recording of the amount reimbursed to the 

12 provider. But we have a physician diagnosis as a 

13 medical condition, not in NMES or in BRFSS but in the 

14 billing and enrollment data, Medicaid and Blue 

15 Cross/Blue Shield data we actually have a physician 

16 diagnosis of the medical condition, and that's why 

17 those squares are blank, indicating that they — 

18 there are real data there, may have errors in them 

19 but at least there appear to be real data there. 

20 Q. And then the — There is only three, let's say, 

21 colors of boxes on this chart, correct, white or 

22 blank, blue and green? 

23 A. Correct. 

24 Q. And what does green mean? 

25 A. Green means the data are not used in any 
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1 analyses that we could figure out, at least, although 

2 they were available in the database. 

3 Q. Does it mean that the data is the same in the 

4 blue characteristic where it's almost entirely 

5 available? 

6 A. Blue means it's almost entirely missing, it's 

7 gone. 

8 Q. I'm sorry, the same as the white, meaning it's 

9 almost entirely variable? 

10 A. Correct, almost entirely available. In some 

11 cases it shows a footnote that shows it's missing. 

12 For example, in BRFSS, self-report of physician 

13 communication and medical conditions, there is a 

14 green box there. It's not used. And I believe the 

15 95 percent means 95 percent missing, so that's — may 

16 be the reason why it wasn't used in that particular 

17 — in the BRFSS analysis, although it was used in 

18 NMES. Am I being clear? 

19 Q. Look to footnote F in that particular box if you 

20 wanted to know an explanation — 


21 

A. 

Correct. 



22 

Q. 

— why it 

wasn ' t 

used there? 

23 

A. 

Correct, 

what is 

going on there. 

24 

Q. 

Professor 

Rubin, 

what is the significance of 


25 this exhibit? 
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1 A. Well the significance can be described in 

2 several ways. The first way, which is just as 

3 important to me having done these kinds of things for 

4 a long time, is that there is a — there is a big 

5 compound data set. What I mean by "compound," it 

6 draws data from different sources and it uses all the 

7 data to try to reach some conclusion, and since all 

8 these databases are being used for some piece of the 

9 analysis, ideally there would be no missing data, 

10 there would be no blues, no percentages, everything 

11 would be complete and then an analysis would be done 

12 of the sort that the perhaps Zeger would do, there 

13 are analyses — The Zeger team would have done an 

14 analysis based on that complete data. I think 

15 everybody agrees, I believe, that wouldn't it be nice 

16 if we had, for example, in BRFSS and the billing 

17 records all the — in the Minnesota billing records 

18 all the information on everybody, on everybody in the 

19 state, for example, so instead we don't have that. 

20 We have missing data various places and so the kinds 

21 of analyses that have to be done try to piece 

22 together in some synthetic way — I think that's the 

23 jargon in statistics and what's used in the Zeger 

24 report to describe it as well — to get synthetic 

25 estimates, because we don't have all this 
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1 information. 

2 Q. And typically in applied statistics, anyway, if 

3 a problem is reasonably complex, you often have the 

4 situation where you don't have data sets that have 

5 all the information that you would like to have in 

6 one data set and there is no missing valves in any of 

7 those? 

8 A. Correct. 

9 Q. Data. 

10 A. That's right. It's — Let me separate that, 

11 though. It's extremely common to have some missing 

12 values in the data set you are dealing with, that's 

13 extremely common. 

14 Q. That's where if it's a survey, for instance, 

15 some respondent did not answer every question? 

16 A. Correct. So for example if we look at NMES and 

17 look at the columns self-reported physician, it's 

18 missing 3 percent of the time. If you look at total 

19 medical expenditure from provider, which is dollars, 

20 it's missing 53 percent of the time, so that's NMES. 

21 That's not uncommon. In NHANES, it's a survey I'm 

22 more familiar, there is unit nonresponse and item 

23 nonresponse. That's very common. It's less common 

24 that an analysis will require putting together many 

25 data sets from different sources to reach a 
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1 conclusion, that's less common, and when you do that, 

2 it is true that you often have these blocks of blue 

3 because one data set will have some kind of 

4 information that's not available in another set. So 

5 I just want to be clear there are two kinds of 

6 answers because what is very common is missing data 

7 due to unit nonresponse or item nonresponse, putting 

8 data sets together to form one giant data set to form 

9 an analysis is less common but when you do that it's 

10 quite common there would be blocks of missing data. 

11 Q. Beyond what you just told me, is there anything 

12 you are trying to illustrate to the jury with this 

13 exhibit? 

14 A. Well to illustrate that the massive amount of 

15 missing data here, the massive amount of blue, which 

16 means that synthetic estimation based on assumptions, 

17 where the assumptions come from outside the data 

18 themselves, have to be made in order to draw the 

19 kinds of conclusions in order to do the kinds of 

20 analyses that you want to do. Because there is so 

21 much missing data, that you have to do something 

22 else. That's one point. That's the blue boxes. 

23 Q. Let me ask you about that. 

24 A. Okay. There is another point as well having to 

25 do with the white boxes. 
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1 Q. Let's ask about the blue boxes first of all. 

2 Isn't it true that — I'll withdraw that. 

3 Professor Rubin, are you aware of any data sets 

4 that do not contain blue boxes that could have been 

5 used in this analysis? 

6 MR. BIERSTEKER: Object to the form. 

7 A. I'm not aware of any existing data sets but the 

8 existence of these — of these blue boxes for the 

9 Minnesota-specific data suggests that perhaps some 

10 data should have been collected, especially in an 

11 issue of this importance, to get real data. 

12 Q. And I understand that it's always possible to go 

13 out and try to create new data sets but my question 

14 really was focused on existing data sets. You were 

15 right. As I understand it, you are not aware of any 

16 existing data sets that would have all the 

17 information that's shown in the various columns in 

18 Exhibit 3548 and had you put that data set on this 

19 chart would not have blue boxes showing missing data, 

20 entirely missing data? 

21 A. Correct, correct. It does emphasize the 

22 desirability if this issue is regarded as important 

23 to try to collect data, gather data, assemble data 

24 from existing records that doesn't have all that 

25 blue. 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



531 


1 Q. Have you explored the concept of gathering 

2 additional data or creating additional data sets for 

3 the Minnesota situation? 

4 A. Explored in the sense of being active and making 

5 specific suggestions beyond saying look at BRFSS for 

6 Minnesota, billing records for Minnesota, wouldn't it 

7 be nice to do a survey of those people and collect 

8 background information on this kind of information 

9 that's in fact missing? No, I haven't, I haven't 

10 done anything beyond that general, broad statement. 

11 There is nothing more than saying wouldn't it be nice 

12 to make an attempt to fill in these blue boxes from 

13 specific data that's relevant to those rows, those 

14 data sources. 

15 Q. Right. And with NMES, for instance, that was 

16 done in 1987; correct? 

17 A. NMES was done in 1987; correct. 

18 Q. Do you believe it would be possible to go back 

19 and find those people and fill in the blue boxes? 

20 A. Well that wouldn't be necessary if the 

21 Minnesota-specific data had that information. My 

22 understanding of these analyses is that the reason 

23 why NMES was — was used is because it had this 

24 detailed information, the white boxes there going 

25 across, so a model could be built and then 
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1 extrapolated to or grafted onto the Minnesota data 

2 that was — that was wholly deficient in that way. 

3 Q. Well let's look at the billing data shown on 

4 here — 

5 A. Correct. 

6 Q. — Exhibit 3548. 

7 A. Okay. 

8 Q. You understand that that billing data goes back 

9 as far as 1978, perhaps, and certainly back into the 

10 early 1980s? 

11 A. I understand it goes back a ways, yes. 

12 Q. Is it — Do you believe we can go back and find 

13 all those people and fill in all the missing values? 

14 A. No, I'm not claiming that would be possible, to 

15 find all those people, or even a simple random sample 

16 of those people, a random sample that would be easily 

17 accessible, but presumably as you become closer in 

18 time to the present, it becomes easier and easier to 

19 do that. As you get towards the late '80s, '87, '88 

20 '89, '90, '91, '92, so forth, it becomes easier to 

21 get data on those people and that would have been 

22 nice to have all those blue boxes filled in with 

23 random samples of people from those years themselves. 

24 Q. When you say it's easier to get data, you mean 

25 it's easier today to go back and find those people 
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1 and get the data from them now? 

2 A. Presumably it's easier. I'm not an expert at 

3 all on what these databases look like, what the 

4 billing records look like or what BRFSS looks like. 

5 Q. Let me ask you — 

6 A. But — 

7 Q. — right there, professor, have you ever looked 

8 at the claims data in this case? 

9 A. No, I have not. 

10 Q. Have you ever looked at the BRFSS data in this 

11 case? 

12 A. No, I have not, but I was going to say that it's 

13 not really necessary to get the exact same people. 

14 If you want to get relationships, you can get 

15 relationships, billing records in Minnesota to these 

16 background characteristics from people that are more 

17 current if you wanted to do that and then apply those 

18 relationships that you see, say from people in 1993, 

19 and see how Minnesota Medicaid recipients. Blue 

20 Cross/Blue Shield recipients, how they — how their 

21 expenditures and billing records relate to their — 

22 these other variables that are available in NMES. 

23 And one could argue that generalizing from 

24 Minnesotans in 1992, and let's say they were done in 

25 1992, to Minnesotans going back in time, and these 
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1 were all Medicaid, Blue Cross/Blue Shield recipients, 

2 its a better idea, less extrapolation, than taking a 

3 national NMES database for 1987 and extrapolating 

4 that to all Minnesotans going back in time, back in 

5 the '70s, which is what the current process is. 

6 Q. Do you have an expert opinion as to whether it's 

7 practical to go back and get enough data from 1993 

8 and subsequent years from the claims data to do what 

9 you just suggest? 

10 MR. BIERSTEKER: Object to the form. 

11 A. I've been involved in things like — in surveys 

12 like that, helping to design surveys like that. If 

13 you have billing records and you have in recent 

14 times, mid '90s, and you had people's names and 

15 addresses and you wanted to collect NMES-type 

16 information on them, you could design a sample and 

17 you could draw them and you could find out, you could 

18 try to get them in, get the information, get the 

19 NMES-type information. I don't know. Presumably the 

20 billing records have names and addresses and maybe 

21 more information but it's not — certainly that's 

22 what survey people do all the time, but I have no 

23 specific knowledge about these billing records. 

24 Q. Other than the various calculations that are 

25 found in your original expert report and your 
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1 supplemental report and these two other exhibits that 

2 — from your previous deposition. Exhibit 3544 and 

3 3545, and any calculations we just looked at in 

4 Exhibit 3548, are there any other calculations that 

5 you have done in connection with the Minnesota case? 

6 A. Any calculations that I directed, let's say? 

7 Q. Sure. 

8 A. Okay. 

9 Q. Done yourself or directed others to do. 

10 A. I don't believe so. Specifically for 

11 Minnesota. 

12 Q. Right. Are there any calculations that you 

13 believe apply to Minnesota other than what we found 

14 in these exhibits that we just identified? 

15 A. The only reason I'm hesitating is, there were 

16 some similar propensity score calculations that were 

17 done, I believe for Texas, using a different set of 

18 background variables because it was relevant to the 

19 Texas case, and I don't remember whether in fact we 

20 used a different set of background variables or the 

21 same, but it's a similar kind of analysis, also from 

22 NMES, getting to the same point. But I think my 

23 memory is that in fact we may have actually used the 

24 Minnesota variables for that because that was what 

25 was available at that time. 
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1 

Q. 

Wouldn't change your 

opinion about 

propensity 

2 

scores — 



3 

A. 

No. 



4 

Q. 

— how they apply to 

the Minnesota 

case? 

5 

A. 

It would not. 



6 


MR. LOVE: Take 

a break? 


7 


MR. BIERSTEKER: 

That's fine. 


8 


(Recess taken from 11:02 to 11 

:12 a.m.) 

9 

BY 

MR. LOVE: 



10 

Q. 

Professor Rubin, have 

you performed 

any analysis 


11 for the Minnesota case that's not described in your 

12 original expert report, your supplemental expert 

13 report, the two Exhibit 33 — I'm sorry — 3544 and 

14 3545 and this last exhibit we just talked about, 

15 3548? 

16 A. No, I've not. 

17 Q. Have you worked on any charts or graphs or 

18 tables or diagrams not shown in those documents we 

19 just described for the Minnesota case? 

20 A. I don't believe so. Let me — There may have 

21 been a question asked of me about something someone 

22 used as input to such a thing but I've not seen a 

23 chart. I have — I have some notes that I made up 

24 but they are not — they are not displayed, just to 

25 remind me what all the analyses are and stuff, but 
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1 that's — those aren't analyses and statistical 

2 analyses, as I think you are talking about, analyzing 

3 the data. 

4 Q. They are notes to remind you about things? 

5 A. Of what all the analyses that they did are in 

6 outline form. 

7 Q. May I see that, please? 

8 A. Yes. I have a copy here. 

9 Q. Great, thank you. 

10 A. I'm not sure it's right. 

11 Q. That's what you have? 

12 A. Yeah, it's what I've put together. I think this 

13 is another copy. I have two copies. They are 

14 putting together with conversations with — with my 

15 data person. 

16 Q. Who is your data person? 

17 A. This is Raghunathan, R-A-G-H-U-N-A-T-H-A-N. 

18 Q. The same person who — 

19 A. Correct. 

20 Q. — did the calculations we saw on Exhibits 3544 

21 be and 3545? 

22 A. Correct, same person. 

23 MR. LOVE: Let's make this as the next 

24 exhibit. 

25 (Plaintiffs' Deposition Exhibit 3549 was 
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1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 


marked for identification.) 

BY MR. LOVE: 

Q. Professor Rubin, I'll show you what we have 
marked as Plaintiffs' Exhibit 3549. 

A. Correct. 

Q. Is that a copy of the notes you just handed to 
me? 

A. Yes, it is. 

Q. Okay. And you have a second copy of that there. 
A. Yes, I do. 

Q. So if we talk about that, you can use your copy 
and I can use the marked copy. 

A. Fine. 

Q. Can you tell me just generally what this exhibit 
is? 

A. It was an attempt to understand and summarize 
the collection of analyses that are done in the Zeger 
report. The reason for doing this is, the Zeger 
report as written is very thin on describing what is 
actually done in many cases and so it was an attempt 
to get a better overview feeling for — for what is 
done and how it changes by type of analysis they are 
trying to do. 

Q. Did you have an opportunity to review the 
computer data that was submitted along the Zeger 
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1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 


reports, the original and supplemental? 

A. I did not myself but Raghunathan did and so this 
is the result of many back and forths with him trying 
to figure out what was actually done in the analyses. 
Q. And you are relying on him for the actual 
calculations that are behind the Zeger reports? 

MR. BIERSTEKER: Object to the form. 

A. Calculations in the sense of reading the code to 
figure out what calculations they were doing rather 
than — it's not two plus three kind of calculations, 
it's calculating what they were — are they doing A 
plus B here or doing C plus Z here. So I haven't — 
I'll stop there. 

Q. Is there any other work that you performed in 
connection with the Minnesota case since writing your 
supplemental report dated January 10, 1988 that you 
haven't told me about already this morning? 

A. Work that I did, that would include reviewing 
documents, I reread depositions, I obviously talked 
to Peter Biersteker. That's work but did it produce 
a document or an analysis? No, that I haven't told 
you about. 

Q. No further documents or analysis? 

A. I don't believe so. 

Q. Did you review the transcript of the deposition 
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1 of Dr. Wyant that was taken in January of this year? 

2 I think it was January 26th. 

3 A. Yes, I did, shortly after it became available, 

4 so it's not — I glanced at parts of it yesterday and 

5 the day before, but I did review it, yes. 

6 Q. I take it you did review the supplemental report 

7 of Drs. Zeger, Wyatt and Miller dated — 

8 A. Yes, I did. 

9 Q. — November 17, 1997? 

10 A. Yes. 

11 (Plaintiffs' Deposition Exhibit 3550 was 

12 marked for identification.) 

13 BY MR. LOVE: 

14 Q. Dr. Rubin, I'll show you what we have marked as 

15 Plaintiffs' Exhibit 3550, titled Smoking Attributable 

16 Health Care Expenditures: Blue Cross, Blue Shield of 

17 Minnesota and State of Minnesota, 1978-1996, 

18 Supplemental Expert Report of: Drs. Zeger, Wyant and 

19 Miller, November 17, 1997. 

20 A. Okay. 

21 Q. Is that the supplemental report of the 

22 plaintiffs' experts that you reviewed in this case? 

23 A. It looks to be the same one that I read before. 

24 Yes, I think it is. 

25 Q. And the data that came with that report you gave 
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1 to Mr. Raghunathan? 

2 A. Raghunathan, yes. 

3 Q. Raghunathan, and he analyzed that for you? 

4 A. Analyzed in the sense of did whatever analysis 

5 was done, yes. 

6 Q. Have you reviewed any of the transcripts from 

7 the trial in this case? 

8 A. Yes, I have. 

9 Q. What have you reviewed? 

10 A. I reviewed Zeger, I reviewed Wyant, I may have 

11 glanced at part of Samet. There may have been pieces 

12 of other ones I looked at as well but not with the 

13 care that I looked at Zeger and Wyant. 

14 Q. When did you review the Zeger transcript, trial 

15 transcript? 

16 A. I think shortly after they became available, so 

17 like — I could try to look at a calendar to try to 

18 figure that out better but I — my guess would be the 

19 end of January. 

20 Q. If I told you they actually testified at trial 

21 the last few days of February — 

22 A. Of February. 

23 Q. — of 1998. 

24 A. Okay, then it would have to be — then it would 

25 have to be within a week or two after that. 
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1 Q. Okay. And the same is true for Dr. Wyant. 

2 A. Correct. 

3 Q. He testified shortly after Dr. Zeger. 

4 A. Right. I remember that. So it would be — In 

5 fact, perhaps it was even days after they testified. 

6 Q. Did you review — 

7 A. It was days after they testified, yes, within a 

8 day, yeah. It was very fast. 

9 Q. So it would have been early March? 

10 A. Yeah. 

11 Q. Did you review them again in preparing for your 

12 deposition today? 

13 A. I did not really review them. I think I glanced 

14 through them to — I did look. They were open. But 

15 "reviewed" suggests I actually started at the 

16 beginning again and read the whole thing and that I 

17 did not do. 

18 Q. Are there any particular parts of those 

19 transcripts that you did look at more — in more 

20 detail to prepare for your deposition today? 

21 A. I'm trying to remember. I think there was one 

22 part of Wyant where I read one page. I don't even 

23 remember what the topic was right now. 

24 Q. Okay. If you look back at Exhibit 34 — I'm 

25 sorry — 3546, your supplemental report. 
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1 A. Yes. 

2 Q. Again in particular looking at topic Roman 

3 numeral I, Propensity Score Methods. 

4 A. Correct. 

5 Q. With respect to that section of your 

6 supplemental report, did you rely in any way on the 

7 supplemental report of Drs. Zeger, Wyant and Miller, 

8 Exhibit 3550? 

9 A. I don't believe any more than I relied on the 

10 first report. 

11 Q. That was really my question. Professor Rubin, is 

12 whether the analysis and discussion here in section I 

13 of your supplemental report is really based on what 

14 was set forth in the original report and nothing 

15 different than was set forth in the supplemental 

16 report. 

17 A. I think, yeah, that's basically true. Perhaps 

18 the only relevant change to my analyses that I report 

19 on in this section would be the fact they don't to 

20 this testimation, bring variables in, knocking them 

21 out, as much as just bringing everything in. 

22 Q. What they call the full model in the 

23 supplemental report? 

24 A. Full model; correct. 

25 Q. And when you did your propensity score analysis 
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1 set forth in your supplemental report, did it matter 

2 whether Drs. Zeger, Wyant and Miller used testimation 

3 or full model approach? 

4 A. No, because the issue is, I have these group — 

5 this group of smokers and nonsmokers within the cells 

6 and how much they differ on this collection of 

7 background variables that you are using to describe 

8 like, what "like" means, and therefore you want to 

9 adjust for differences in those background variables, 

10 and whichever way they tried to adjust, whether the 

11 original or the supplemental way, comes under the 

12 same criticism that's — criticism that's revealed by 

13 the analyses that I do there. 

14 Q. And does your propensity score methods section 

15 of your supplemental report and the analysis you did 

16 there depend in any way on the relative errors that 

17 are set forth on page 6 of the supplemental report of 


18 

Zeger, 

Wyant and Miller, 

Exhibit 

3550? 


19 


MR. BIERSTEKER: 

: I'll 

object to 

the form. 

20 


MR. LOVE: I'm 

sorry. 

the wrong 

— I think 

21 

that' s 

the right page. 





22 A. Not in any direct way. In an indirect way — 

23 Let me make sure I'm answering the question. 

24 Q. Let me ask a different question. 

25 A. Fine. 
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1 Q. Because I think I understand something you were 

2 trying to convey. 

3 Other than the fact that those relative errors 

4 were based in part upon NMES data — 

5 A. Uh-huh. 

6 Q. — is there any other way in which the relative 

7 errors reported on page 6 of the — of Exhibit 3550 

8 affected your discussion in section I of your 

9 supplemental report, propensity score methods? 

10 A. No, not in the way they affected the analysis of 

11 the report there. However, the — if you couple 

12 those very large relative errors with the results 

13 there that are in the propensity score method 

14 section, it has perhaps more impact on what you can 

15 draw from the — conclusions you can draw from the 

16 analyses that they did. 

17 Q. Did you describe in your supplemental expert 

18 report any impact of comparing the propensity score 

19 analysis with the relative errors reported in the 

20 Zeger report? 


21 

A. 

No, 

I did not; 

no, I did not. 

22 

Q. 

Did 

you make - 

- Did you give any consideration 


23 to looking at the relative errors in the Zeger report 

24 at the time you wrote your supplemental report? 

25 MR. BIERSTEKER: Object to the form. 
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1 A. Well in a general sort of way, yes. Just as 

2 part of doing statistics when you are trying to 

3 adjust for background variables, one of the issues 

4 that has to be in the back of your mind is how well 

5 relationships are estimated, relationships with the 

6 background variables to smoking and nonsmoking in 

7 relationship to these background variables to 

8 outcomes such as healthcare costs. So, those — 

9 those relationships, if they are purely estimated, 

10 are — have a general kind of effect on how important 

11 the problem — what the impact of the problem would 

12 be, but they didn't affect the analyses that I'm 

13 doing here, which is solely focused on how much the 

14 background variables differ, the distribution of them 

15 differ between smokers and nonsmokers within these 

16 cells. 

17 Q. Referring to the analyses you are doing here, 

18 you mean in section I of your supplemental report? 

19 A. Correct. 

20 Q. And based on the chronology that we have been 

21 discussing today, I take it that you did not rely on 

22 the expert reports of any of the other defense 

23 experts in this case, in particular Dr. Wecker or Dr. 

24 McCall, when you wrote your supplemental report? 

25 A. Correct, I haven't seen any. 
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1 Q. Did you rely in any way on any information 

2 provided by Dr. Samet when you provided your 

3 supplemental expert report? 

4 A. I don't believe so. Do you mean provided by 

5 Samet when I read testimony? This is other than — 

6 than is in — 

7 Q. Either in Dr. Samet's reports or testimony or 

8 deposition or anywhere — 

9 A. No, I don't believe so. Well except to the 

10 extent that Samet is an epidemiologist and was aware 

11 of the need to adjust for confounding variables, so 

12 that's what this analysis is about, how well can you 

13 adjust for this collection of confounding variables 

14 that apparently that team has agreed should be 

15 adjusted for, and Samet was involved in, presumably 

16 to some extent at least, maybe limited, in deciding 

17 what variables, what background variables should be 

18 in there to adjust for, and so this analysis is 

19 saying can you adjust for that set that that team 

20 agreed on should be adjusted for and how well can you 

21 do that, and that's what these analyses, propensity 

22 score analyses address. They address that question. 

23 So in that very general sense, I guess I'm relying on 

24 something that Samet said. 

25 Is that helpful or just rambling? I don't mean 
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1 it to be rambling. 

2 Q. I know. 

3 Was there anything specific that Dr. Samet said 

4 that you are relying on? 

5 A. No. 

6 Q. Did you consult with anyone other than perhaps 

7 Dr. Raghunathan in determining whether to conduct 

8 propensity score analyses in this case? 

9 A. No, I did not. 

10 Q. Did you consult with anyone other than him in 

11 performing a propensity score analysis in this case? 

12 A. No, I did not. Well only in the general sense. 

13 Again that I talked to Peter Biersteker about why I 

14 thought this would be a relevant thing to do because 

15 what the game is is comparing like with like and 

16 background variables have different distributions in 

17 the two groups. You have to be very careful about 

18 how you do that, and that's what these techniques, 

19 propensity score techniques are designed to do. 

20 Q. But other than that, you didn't consult with 

21 anyone else? 

22 A. Correct. 

23 Q. Now what data sets did you use in this 

24 propensity score analysis set forth in section I of 

25 your supplemental expert report? 
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1 A. This is a data set that the Zeger group provided 

2 to us on NMES and it was a data set with their 

3 imputations, even the ones I regard as completely 

4 wrong, had just accepted it because I didn't want to 

5 get into that issue, said okay, let's take that data 

6 set that they used and they are using to make these 

7 adjustments, what is — what's the evidence that 

8 those adjustments are being well done or poorly done, 

9 accepting the data that they created. 

10 Q. Did you perform a propensity score analysis on 

11 any data sets from the behavioral risk factor 

12 surveillance system data sets in this case? 

13 (Interruption by the reporter.) 

14 A. No, I did not. 

15 Q. Did you perform any propensity score analysis on 

16 data sets from the National Health and Nutrition 

17 Examination Survey, NHANES? 

18 A. No, I did not. 

19 Q. Did you perform any propensity score analysis on 

20 the Minnesota — the state of Minnesota's claims data 

21 for it's Medicaid program or its GMAC program? 

22 A. No, I did not, but I should probably clarify at 

23 this point that except for possibly NHANES, the 

24 propensity score analyses would have been irrelevant 

25 in BRFSS and would have been irrelevant in the 
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1 billing, Minnesota billing records, because that's 

2 not where the SAFs and SAEs were estimated adjusting 

3 for background characteristics. 

4 Q. I'm just asking — 

5 A. Okay. 

6 Q. I want to make sure I understand what your 

7 propensity score analysis was performed on because 

8 it's not completely clear to me from the language in 

9 the report — 

10 A. I understand. 

11 Q. — just what you did. 

12 A. I apologize. 

13 Q. I take it there were no other data sets other 

14 than the NMES data set provided to you by Drs. Zeger, 

15 Wyant and Miller on which you did perform a 

16 propensity score analysis in the Minnesota case? 

17 A. Correct. 

18 Q. Now you told me that you used — Let me back up 

19 a second. 

20 The NMES — You are familiar with the NMES data 

21 set itself, that NMES — 

22 A. In a general sort of way. 

23 Q. — makes available? 

24 A. Yes. 

25 Q. Have you ever examined that data set? 
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1 A. Have I personally ever examined the data set? 

2 Q. Yes. 

3 A. No, only through other people doing things for 

4 me. I've not held that data set in my lap. 

5 Q. Have you ever asked one of your colleagues to 

6 examine that data set for you? 

7 A. Other than Raghunathan? 

8 Q. Including him. 

9 A. Well these analyses are based on that data set 

10 so I must have asked him to do that. 

11 Q. Well I thought you said the analysis was based 

12 on a NMES data set provided to you by Drs. Zeger, 

13 Wyant and Miller along with whatever imputations were 

14 in the data at that time. 

15 A. I'm sorry, ask the question again. 

16 Q. Sure. I'm trying to distinguish between that 

17 data set that may have some imputations in it as 

18 opposed to the NMES data set that if you went to NMES 

19 itself and said please can I have your data from the 

20 1987 survey? 

21 A. Correct. I think the only data set we had 

22 available was that provided by the Zeger team. 

23 Q. Okay. 

24 A. But I can't be positive about that, but I think 

25 that's the only one I had. 
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1 Q. Do you think that's the data set that actually 

2 is analyzed in your supplemental report? 

3 A. Yes. 

4 Q. Now does that data set have — Withdraw that. 

5 The NMES survey itself, as you've indicated in 

6 Exhibit 3548, had some missing data in it; correct? 

7 A. Before — At what point? Yes, it has missing 

8 data, absolutely. 

9 Q. And it has missing data of a nonresponse type 

10 where a survey person did not provide answers to all 

11 the questions that they were asked; is that correct? 

12 A. Let me try to clarify. 

13 Q. Sure. 

14 A. There are generally two kinds of missing data 

15 not described, one is called "unit," one is called 

16 "item." A person who provides much information and 

17 leaves some items out, that's call "item," and 

18 somebody who doesn't do anything is called "unit" 

19 non-responded, and I suspect you are saying are there 

20 missing data problems of both kinds. 

21 Q. That is what I'm saying. Professor Rubin. 

22 A. I'm trying to be helpful. 

23 Q. Thank you. 

24 A. And the NMES, when you think of it as including 

25 the supplement, the supplement can be viewed as a 
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1 unit nonresponse because people did not — a group of 

2 people did not respond to that, but throughout all of 

3 NMES there is item nonresponse on particular kinds of 

4 things, in particular the medical expenditure 

5 questions. 

6 Q. Did NMES itself or the agency for health care 

7 policy that conducts NMES, make any type of 

8 imputation for that type of missing data, the unit 

9 nonresponse and the item nonresponse? 

10 A. For the missing expenditure data, they did. It 

11 may be for some other things as well but the primary 

12 one that I was concerned with was the expenditure 

13 data. I'd have to scan this display and look it at 

14 all the footnotes to make sure I'm not messing up on 

15 what other items they may have imputed. I think 

16 there was but I don't remember which one it was that 

17 the agency imputed before releasing it. 

18 Q. But certainly before releasing the data to 

19 anyone including Dr. Zeger, Wyant and Miller, NMES 

20 made it's own imputations for some missing — 

21 A. Correct. 

22 Q. — values? 

23 A. Correct. 

24 Q. And did the data set that you examined in your 

25 propensity score analysis described in your 
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1 supplementary report have those imputations in it? 

2 A. Yes, presumably, because that was the data set 

3 that the Zeger team used and the Zeger team just 

4 accepted those imputations. Again I accepted the 

5 imputations as well as the Zeger group imputations 

6 just to disentangle the issues can you adjust for 

7 these variables as they had them in a reliable way 

8 from the issue of missing data. 

9 Q. Is it your testimony that in addition to 

10 whatever imputations that NMES may have made to its 

11 data set, the date set you examined also had 

12 additional imputations by Drs. Zeger, Wyant and 

13 Miller? 

14 A. Correct. 

15 Q. What additional imputations were those? 

16 A. Well I can just look at them. Some imputations 

17 were on smoking status, I believe seat belt use, I 

18 believe overweight. Is that from them? I'd have to 

19 take time to look through these. I could give you — 

20 One reason for doing this is so I didn't have to try 

21 to fill my mind with things that are details, so 

22 subjects of missing smoking data treated by former 

23 smokers by plaintiffs, that's an imputation. 

24 Self-report of medical conditions, that's not missing 

25 very much. C is done by the agency. H is poverty 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



555 


1 level. What was done there? It was done by the 

2 agency, that was done by the agency. Kind of health 

3 insurance, I believe that was done by the Zeger 

4 team. Yeah, I believe that's right. Marital status 

5 I believe was done by the Zeger team, education I'm 

6 almost sure was done by the Zeger team, I believe 

7 seat belt use was imputed by the Zeger team. It's on 

8 the next page. Check that. Yeah. The way it's 

9 described here isn't quite right. But overweight 

10 status I believe is done by the Zeger team, treated 

11 as a different category, self-report health status. 

12 So, certainly there were those variables that were 

13 imputed by — by the Zeger team in addition to the 

14 ones that were handled by NMES people or the agency. 

15 Q. Can you draw any conclusions about the validity 

16 of using the NMES data supplied by the agency with 

17 only it's own imputations for doing the work Zeger, 

18 Wyant and Miller did based on your propensity score 

19 analysis of the data set that has not only NMES's own 

20 imputations but also additional imputations by Drs. 

21 Zeger, Wyant and Miller? 

22 MR. BIERSTEKER: Object to the form. 

23 A. I'm not sure I understand the question. Maybe I 

24 can restate it, what I think you are saying. 

25 Q. Sure, if you want to. 
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1 A. Do my propensity score analyses reveal anything 

2 about the effect of the imputations that were done 

3 for missing data by either the Zeger team or the 

4 agency? No, that's not? That's not it. 

5 Q. I'll try again. 

6 A. Sorry. 

7 Q. The propensity score analysis you did including 

8 imputations made by the agency and imputations made 

9 by Drs. Zeger, Wyant and Miller; correct? 

10 A. Correct. 

11 Q. Does that propensity score analysis tell you 

12 anything about using the NMES data set as provided by 

13 NMES before anybody makes imputations to perform the 

14 kind of — to use in the kind of model that Drs. 

15 Zeger, Wyant and Miller developed apart from their 

16 imputations? 

17 A. I'm not sure I follow the question. 

18 Q. Were you trying to estimate smoking-attributable 

19 health expenditures in this descriptive way you told 

20 me about? 

21 A. Correct. 

22 Q. Do you know from your propensity score analysis 

23 of the data set you were provided whether the NMES 

24 data set that the agency provides could be used for 

25 that purpose in a linear regression fashion? 
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1 MR. BIERSTEKER: Object to the form. 

2 A. I'm still not exactly sure. I guess what I — 

3 I'll try to answer it and see if this is. The 

4 analyses that I did, the propensity score analyses I 

5 did in this section, suggests that the data as 

6 provided by the agency and as filled in with missing 

7 data by the Zeger team does not support the use of 

8 the kinds of linear regression or log-linear 

9 regression methods that the Zeger team used to do the 

10 adjustments, it does not support the validity of 

11 those adjustments. 

12 Q. That's the conclusion you draw from your 

13 propensity score analysis? 

14 A. Correct. That the groups, the smokers and 

15 nonsmokers even within these age/sex cells, differ 

16 markedly on the collection of background variables 

17 they want to adjust for and differ to an extent that 

18 previous work says that you cannot trust the kinds of 

19 modeling adjustments that they are making, "they" 

20 meaning the Zeger team were making in their analysis. 

21 Q. My question. Professor Rubin, is: Does your 

22 propensity score analysis lead you to any conclusions 

23 about whether you could use the NMES data set as 

24 provided by the agency without any additional 

25 imputations by the Zeger team to perform the kind of 
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1 appropriate regressions and other linear regressions 

2 that they performed? 

3 A. I think I know what you are driving at. The 

4 problem that's revealed by the propensity score 

5 analyses is not a problem with the way the Zeger team 

6 imputed data. It's just a state of nature that the 

7 people who were smokers and nonsmokers, as reported 

8 by NMES, differ substantially in background 

9 characteristics and therefore you cannot do the kind 

10 of adjustment for those background characteristics in 

11 the relative naive way that they are doing it and get 

12 reliable answers. It won't work. 

13 Q. So — 

14 A. That problem is not created by the imputation 

15 that the Zeger team is doing. The problem is created 

16 by the state of nature. Smokers and nonsmokers 

17 differ relatively substantially with respect to the 

18 background variables that they say they want to 

19 control for. 

20 Q. So it's your testimony that the data set as 

21 supplied by the agency that conducted NMES could not 

22 be used to perform the kinds of linear regressions 

23 that the Zeger team did in their work? 

24 A. Well "can" is a funny word. They did it so it 

25 can be used. When the results come out should you 
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1 believe them? No. 

2 Q. Okay. 

3 A. So it can't be used to get reliable results. 

4 But obviously you can do it in the sense I can jump 

5 out a window. It's not very good for my health 

6 but — 

7 Q. But in your opinion, anyone who tried to use the 

8 NMES data set supplied by the agency to perform those 

9 types of linear regression analyses would produce 

10 results that you don't believe are reliable? 

11 MR. BIERSTEKER: Object to form. 

12 A. That's correct, for those kinds of analyses, for 

13 the purpose they were doing them, that's correct, I 

14 would not — I would not believe in those results. 

15 Q. Would your opinion be the same for someone 

16 trying to use the NMES information without even — 

17 with no imputations, not even those made by the 

18 agency? 

19 MR. BIERSTEKER: Object to form. 

20 A. I haven't done the proper propensity score 

21 analysis leaving out imputations, but based on these 

22 analyses, I think the answer would be yes, that even 

23 if I did the proper propensity score analyses, 

24 accounting for the missing data, that I would still 

25 find that smokers and nonsmokers differ substantially 
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1 with respect to these background characteristics. In 

2 other words, I don't believe the imputations that 

3 were created by the agency or by the Zeger team made 

4 the groups look far apart. I think they are — I 

5 think smokers and nonsmokers are far apart with 

6 respect to the background variables they want to 

7 control for, and moreover, they may be far apart on 

8 patterns of missing data, which is something else 

9 that if you are going to do the analysis correctly 

10 you should try to adjust for. 

11 Q. If you had the NMES data without any imputations 

12 by the agency or anyone else and performed what you 

13 believed to be proper multiple imputation on that 

14 data set to fill in the unit non-responses and the 

15 item non-responses, do you have an opinion as to 

16 whether a propensity score analysis performed on that 

17 data set would lead you to the same conclusion that 

18 you can't use that data set to perform the kinds of 

19 linear regressions that Drs. Zeger, Wyant and Miller 

20 do for the purpose that they perform them? 

21 A. That's an excellent question. In fact, for the 

22 purpose of propensity score analyses, I would not use 

23 multiple imputations. I would use that later on to 

24 try to get estimate of the SAFs and 

25 smoking-attributable expenditures. But the correct 
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1 propensity score analyses to determine whether the 

2 smokers and nonsmokers were far apart with respect to 

3 the background variables would actually leave the 

4 missing data as missing and also try to worry about 

5 whether the smokers and nonsmokers differed with 

6 respect to their patterns of nonresponse. There was 

7 a thesis on that a couple years ago for a student of 

8 mine, probably going through his last stages in — 

9 through the publication process now, but he addresses 

10 that issue of what should be a proper propensity 

11 score analysis, proper adjustment when you have an 

12 exposed and unexposed groups that differ not only in 

13 values of background variables but also perhaps in 

14 the missing data patterns. 

15 Q. So in order to determine whether a data set 

16 should — can be used properly in your opinion to 

17 perform linear regressions, you should look at the 

18 propensity score analysis of the data set without any 

19 imputations; is that correct? 

20 A. Without any imputations created, that's correct, 

21 because it may be that the patterns of nonresponse do 

22 differ between, in this case, smokers and nonsmokers, 

23 in which case comparing like with like doesn't just 

24 mean the same age, the same marital status but the 

25 same proclivity to have — to not respond to 
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1 something, not respond to the age question, not 

2 respond to some other question. 

3 Q. Is it possible that the propensity score 

4 analysis on that unimputed or missing value data set 

5 could be either — show better conditions for using 

6 linear regression or poorer conditions using linear 

7 regression than a propensity score analysis on the 

8 same data set after imputation? 

9 A. Good question. Almost certainly, I think I said 

10 this before but to be clear, almost certainly it 

11 would be worse. 

12 Q. Worse before the imputation or after the 

13 imputation? 

14 A. It would be — Yeah. I should say that the 

15 groups would almost certainly be more different doing 

16 the propensity score analysis the correct way leaving 

17 out imputations because not only can the groups 

18 differ on the values of variables that are reported 

19 but moreover they could differ on the patterns of 

20 nonresponse. That's another reason they can be 

21 different. There are more ways which can be 

22 different using the correct analysis. 

23 The way imputations are done, they are probably 

24 making the smokers and nonsmokers look more similar 

25 and not allowing for the fact that they — that they 
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1 might have missing data patterns that are different. 

2 I don't know whether they have missing data patterns 

3 that are different. It's like adding another 

4 variable to adjust for in there and if it turns out 

5 the missing data patterns are the same for smokers 

6 and nonsmokers, then you are back where you were with 

7 this analysis, basically. But if the patterns 

8 differ, then you ever a bigger difference to adjust 

9 for, so they are farther apart than they were before. 

10 Q. Is there any possibility that the propensity 

11 score analysis of the data set with the imputations 

12 would produce results that suggested that it was more 

13 favorable for linear regression than the propensity 

14 score analysis for unimputed data set? 

15 A. I'm saying its likely. These analyses are 

16 reported, are making it more favorable for regression 

17 than the — a more proper analysis would be. So my 

18 feeling is that the situation is worse than this, and 

19 that's why I didn't want to get into doing — I 

20 wanted to just accept what they did for imputations 

21 and do that analysis to show that even there, even 

22 doing — even accepting their imputations, that doing 

23 the analysis the way they are doing it to get SAFs is 

24 not reliable. 

25 Q. Is it possible that performing the propensity 
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1 score analysis on the imputed data set would show 

2 conditions less favorable for linear regression 

3 analysis than the propensity score on the unimputed 

4 data set? 

5 A. I don't believe so. I think I sort of said 

6 that. It's almost — 

IQ. I understood you to say that you would think 

8 probably doing the imputation would make the 

9 situation appear more favorable for doing a linear 

10 regression. 

11 A. Correct. 

12 Q. My question is, although that's your belief 

13 what's probably going to happen, is it possible it 

14 could happen just the other way around? 

15 A. Let me think for a second because that has to do 

16 with how the imputations were not done well and so 

17 here you are doing something incorrectly and doing 

18 one thing incorrectly could it make it worse. In 

19 fact the way these imputations were done, as I'm 

20 scanning through the things that were done, I think 

21 they would only tend to make the smokers and 

22 nonsmokers look more similar on the background 

23 variables that are being imputed than they — than 

24 they really are and therefore I think doing it the 

25 right way would only — would only reveal a bigger 
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1 difference. 

2 Q. So it's your belief the way the imputations were 

3 done in this particular situation would make the 

4 propensity scores on the imputed data set appear more 

5 favorable to linear regression than otherwise? 

6 A. Correct, correct. I could try to clarify 

7 because I think in almost all the imputations, for 

8 example, the smoking variable is never used to do the 

9 imputation. Smoking itself is imputed but I don't 

10 think the other analyses use smoking to do the 

11 imputation, which means that when you are imputing 

12 for smokers and nonsmokers, using the same 

13 relationship and making smokers and nonsmokers look 

14 closer together than they would be if you did it 

15 correctly. So that's the — the current basis of 

16 that opinion based on just thinking about it now with 

17 you. I haven't gone into deep thought before about 

18 it, although the general issue is obviously I have 

19 given it lots of thought over the years. But in 

20 response to this question, I'm almost certain that 

21 that would be true. 

22 Q. As a general matter, not on these specific data 

23 sets but just in general, is it possible at least 

24 that imputations could make the situation under 

25 propensity score analysis seem less favorable for 
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1 linear regressions? 

2 A. If you did it in a — really incorrectly. I'll 

3 give you an example of what I have in mind. Let's 

4 suppose that we are missing — what's a variable 

5 that's missing a fair bit that's used in these 

6 things? Overweight status, seat belt use, marital 

7 status, let's suppose we imputed all the smokers to 

8 be married and all the nonsmokers to be divorced. 

9 That would make it look bigger on marital status 

10 because they would look more different on marital 

11 status than they were before, presumably at least if 

12 we picked the modal category for smokers and imputed 

13 marital status, let's say most smokers are married, 

14 we made all smokers married and most nonsmokers are 

15 not married, all the nonsmokers not married would 

16 exacerbate the difference and make it look bigger. 

17 You could do something and make it bigger but that's 

18 a hypothetical question that wasn't done here at 

19 all. I can make up examples like that to be — you 

20 know, they are kind of silly, though. 

21 Q. I understand. I just want to find out, it can 

22 happen that way? 

23 A. Yeah. 

24 Q. It depends on what the imputations did? 

25 A. Not in a situation like this where they are 
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1 doing imputations the way they are doing it. 

2 Q. Have you looked at each of the imputations that 

3 the agency itself did to determine whether they made 

4 smokers and nonsmokers more alike or less alike? 

5 A. I think they made them more alike because I 

6 don't think that there is sequential hot deck 

7 imputation method used smoking as one of the factors 

8 so therefore automatically they are making it more 

9 similar than they would be otherwise. I think that's 

10 correct. 

11 Q. Do you think that the imputations that Dr. 

12 Zeger, Wyant and Miller did to the NMES data set made 

13 smokers and nonsmokers more alike or less alike? 

14 A. I believe they made them more alike, again for 

15 the same reason. I don't believe they used smoking 

16 as a variable when creating the imputations for the 

17 other variables, so therefore they are using some 

18 average value. I think that's correct. I could look 

19 at these charts again and read through the footnotes 

20 to try to refresh my memory but that's my — that's 

21 my global memory now, is that they were done that 

22 way. 

23 Q. But your testimony is that is how imputations 

24 are done without taking smoking into consideration, 

25 then that will make smokers and nonsmokers more alike 
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1 than before the imputations? 


2 A. Correct. Because it's treating them the same 

3 way. 

4 Q. Professor Rubin, did you prepare any drafts of 

5 your supplemental expert report. Exhibit 3546? 

6 A. I'm sure I did. 

7 Q. Okay. 

8 A. I almost never write the thing and let it be 

9 done. 

10 Q. Did you discuss those drafts with anyone as you 

11 were doing them? 

12 A. Yes. 

13 Q. With whom? 

14 A. I discussed them primarily with Raghunathan and 

15 I discussed them with Peter Biersteker, too. 

16 Q. Did you provide copies of those drafts to 

17 anyone? 

18 A. Yes. 

19 Q. To whom? 

20 A. I provided them to Raghunathan and Peter 

21 Biersteker. 

22 Q. Do you currently have copies of those drafts? 

23 A. No. I get rid of — Just generally when I write 

24 stuff, there are so many drafts that are created, I 

25 work on many papers at the same time. If I kept 
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1 drafts around I would be totally confused, more than 

2 I am now. 

3 Q. You did create drafts on the way to the final 

4 report; correct? 

5 A. Yes, but as soon as I had a new one I threw away 

6 the old one. 

7 Q. But you did give copies to Raghunathan and Mr. 

8 Biersteker? 

9 A. Yes. Not as many — 

10 Q. Not all of them? 

11 A. — versions as I went through but I thought 

12 here's something that looks pretty good, I think, 

13 what do you think? Do you understand what I'm 

14 saying? Is it clear, the point's clear? If not, 

15 I'll have to clarify. 

16 Q. Were they given hard copies or disks or how — 

17 what form did the copies take? 

18 A. I think most of it was fax back and forth. 

19 Q. Hard-copy fax? 

20 A. Hard-copy fax, I think. 

21 Q. Do you know whether Raghunathan saved any of his 

22 copies of the drafts? 

23 A. I would hope not because he was my former 

24 student. He would get nothing done if he kept copies 

25 of all this stuff, but I don't know. 
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1 Q. You don't know one way or the other? 

2 A. I don't know one way or the other, no. 

3 Q. Did you make any attempt to provide plaintiffs 

4 with your drafts of the supplemental expert report? 

5 A. No. I don't think the drafts — This is an 

6 aside. I don't think the drafts changed except the 

7 labeling of columns to make it clear what was going 

8 on and writing them, to my memory. 

9 Q. Did you have any work papers that you — 

10 Besides drafts, did you have work papers that 

11 you used that you were writing your supplemental 

12 report and doing the analysis on which it was based? 

13 MR. BIERSTEKER: Object to form. 

14 A. I'm not sure what a work paper is. 

15 Q. Any other paper besides a draft of the report, 

16 whether it's just a piece of paper on which you 

17 performed some analysis or whether it's commuter 

18 output or whatever it might be. 

19 A. No, I don't believe so. But just to clarify, 

20 there certainly were articles that are referred to in 

21 there that I had available but that's — 

22 Q. I'm referring to the work of you or your 

23 colleagues — 

24 A. Oh. 

25 Q. — on this case — 
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1 A. No. 

2 Q. — as opposed to previously — 

3 A. No, no. 

4 Q. — existing articles. 

5 A. No. 

6 Q. So nothing like that was created during your 

7 work on the supplemental report? 

8 A. No. At least not that I remember. I don't 

9 think so. 

10 Q. Earlier you were discussing documents that you 

11 reviewed either in preparing for your deposition or 

12 in — just in your general work on this case; 

13 correct? 

14 A. Correct. 

15 Q. And as one guide, we were using the supplemental 

16 pre-designation Exhibit 3547. 

17 A. Correct. Well that number. I'll accept the 

18 number. I assume it's right. 

19 Q. Item 11 on that pre-designation Exhibit 3547 is 

20 the book titled The Evolving Role of Statistical 

21 Assessments as Evidence in the Courts; correct? 


22 

A. 

Correct. 



23 

Q. 

And you told me 

you 

had looked at that? 

24 

A. 

I had looked at 

that 

and in particular the 


25 section that seemed to be relevant to an issue that 
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1 arose in the trial. 

2 Q. When was the first time you looked at that book? 

3 A. Saturday. 

4 THE WITNESS: I'll have to tell Steve 

5 Fienberg, my friend, you are Xeroxing instead of 

6 buying it. 

7 MR. LOVE: Off the record a second. 

8 (Discussion off the record.) 

9 BY MR. LOVE: 

10 Q. Doctor Rubin, I'll show you what has been marked 

11 as trial Exhibit 26,046, the evolving — titled The 

12 Evolving Role of Statistical Assessments as Evidence 

13 in the Courts, and ask if that's a copy of the book 

14 that you reviewed on Saturday. 

15 A. It probably is. It feels about the same number 

16 of pages as the Xeroxed copy that I have in my 

17 possession as of Saturday. 

18 Q. So you were viewing a Xerox copy — 

19 A. Correct. 

20 Q. — rather than a printed book itself? 

21 A. Correct. 

22 Q. Fair enough. Now I think you mentioned that — 

23 well do you know the editor of that book, Steven 

24 Fienberg? 

25 A. Yes, I do. 
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1 Q. How do you know him? 

2 A. I've known him probably since 1968. He was a 

3 graduate student in the department of statistics a 

4 couple years ahead of me, I guess, at Harvard. We 

5 bump into each other relatively frequently. 

6 Q. And as I understand this book, if you look at 

7 the second or third page in, it's the result of work 

8 of what's called the Panel on Statistical Assessments 

9 as Evidence in the Courts. 

10 A. By Panel on Statistical Assessments as Evidence 

11 in the Courts, yes. 

12 Q. And that it was sponsored by two committees, one 

13 being the Committee on National Statistics and the 

14 other being the Committee on Research on Law 

15 Enforcement and the Administration of Justice? 

16 A. That's what it says, yes. 

17 Q. Okay. And have you ever been a member of the 

18 Committee on National Statistics? 

19 A. Yes, I was. 

20 Q. And were you a member of that committee at any 

21 time when Stephen Fienberg was also a committee 

22 member? 

23 A. I'm trying to remember. Steve was chair and 

24 Burt Singer became chair but Steve would still come 

25 to meetings, but I don't know if he was on the 
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1 committee. 

2 Q. He attended meetings of the committee but you 

3 don't know if he was an official member at the time? 

4 A. Right. 

5 Q. Because if you look just a few pages in where 

6 they list the Committee on National Statistics for 

7 1986-1987, you will see Stephen Fienberg's name 

8 listed as at chair, as you indicated. 

9 A. Right. So he was — he was chair for a while 

10 and then Burton Singer, whose name I guess is not on 

11 it, came in afterwards, I believe, and I believe I 

12 was on the committee for a couple, three years when 

13 Burt was chair. 

14 Q. Looking still at that same page at the members 

15 of the Committee on National Statistics, the third 

16 name is Seymour Geisser; is that right? 

17 A. Yes. 

18 Q. Do you know him? 

19 A. Yes, I do. 

20 Q. He is at the University of Minnesota? 

21 A. Correct. 

22 Q. And I notice there was a Nan Laird at the 

23 department of biostatistics at the Harvard School of 

24 Public Health? 

25 A. Yes. 
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1 Q. I assume that's a woman. 

2 A. Correct. 

3 Q. Do you know her? 

4 A. Yes. 

5 Q. And also in terms of Harvard representation, 

6 there is John W. Pratt at the graduate school of 

7 business? 

8 A. Yes. 

9 Q. Do you know him? 

10 A. Yes. I think I know everybody here. 

11 Q. Okay. 

12 A. So you don't have to go one by one. The only 

13 one I don't know very well is Courtenay Slater. 

14 Q. Do you have any understanding as to whether this 

15 — the panel on statistical assessments as evidenced 

16 in the courts is a panel of the Committee on National 

17 Statistics? 

18 A. Well I assume — I think it does say that, 

19 doesn't it? 

20 Q. That's my understanding but I wanted to know 

21 have you looked at it. 

22 A. Right. The way this is set up, the Committee on 

23 National Statistics is really like a committee 

24 designed to form committees. It itself, at least in 

25 my experience, doesn't do much work in terms of a 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



576 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 


project but what it does is it spends time trying to 
decide what are projects that are worth doing, then 
try to round up funding for the projects and then 
trying to select members to be on the panel and the 
panel actually does the work in terms of writing a 
report for, typically, for the National Research 
Council, National Academy of Sciences, and if they 
think it's generally interesting they can look for an 
outside publisher for the report. So I was involved 
in one of these early on, I don't know when, on 
missing data. There was three volumes, committee on 
national panel for incomplete data, and the results 
were in three volumes that were published by, I don't 
know, maybe MIT press. 

Q. So that was another panel of the Committee on 
National Statistics? 

A. Correct. 

Q. As you said, you know most of people listed as 
committee members of the Committee on National 
Statistics? 

A. Correct. 

Q. In this particular book. 

A. Right. 

Q. Are they all well-respected statisticians? 

A. Well, Jerry Hausman is an economist; Tom Juster 
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1 does some social statistician, survey work, very good 

2 understanding of that; Jane is a demographer; John is 

3 business school; Jim, ES; Courtenay Slater, I'm not 

4 exactly sure, sort of more of I think a policy-type 

5 statistician; Judy Singer is a sociologist with some 

6 strong interest in statistics, does some things in 

7 statistics, and Ken, Ken Wachter also has a 

8 demography side. 

9 Q. Would you say that these people are all 

10 well-respected experts in their fields, anyway? 

11 A. I think they are. I mean — 

12 Q. All the ones that you know. 

13 A. Yeah. Experts in their fields, not being an 

14 economist, not being a sociologist it's sort of 

15 unfair for me to say but I think they are. Certainly 

16 the point of getting people on the Committee of 

17 National Statistics is to try to get a good 

18 representation of people from diverse areas that have 

19 interest in statistics and are well respected, so I 

20 certainly believe that would be the reputation, 

21 statistics and in the fields in which they really 

22 reside. 

23 Q. And for those people who are in the field of 

24 statistics, for instance Stephen Fienberg and Seymour 

25 Geisser, they are well respected in the field of 
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1 statistics; correct? 

2 A. Absolutely. 

3 Q. If you look at a few pages earlier, you will see 

4 a listing of people on the Panel on Statistical 

5 Assessments as Evidence in the Courts. Do you have 

6 that page. Professor Rubin? 

7 A. Uh-huh. 

8 Q. In terms of the people on that who are in the 

9 field of statistics, like Stephen Fienberg and 

10 Paul — 

11 A. Meier. 

12 Q. — Meier and Sandy — 

13 A. Zabell. 

14 Q. — Zabell, are they all respected experts in the 

15 field of statistics? 

16 A. Yes. 

17 Q. Were you involved at all in — 

18 A. So is Bill Hunter. 

19 Q. And I didn't mean to leave anyone out. But the 

20 people who are in the field of statistics on the 

21 panel are well-respected experts in that field? 

22 A. Yes, they are well-respected statisticians. 

23 Q. Did you yourself have any involvement in this 

24 particular project? 

25 A. No. 
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1 Q. Based on your knowledge of the people involved 

2 in the project and the review of the book that you 

3 were able to make over the weekend, do you consider 

4 it a reliable authority in the field of using 

5 statistics as evidence in courts? 

6 A. I don't know how to answer that because I'm not 

7 an expert on using statistics in courts. I'm sure 

8 that's what it was intended to be and probably may 

9 well be but the reason I'm hesitating is the way 

10 these panels work sometimes, they have very limited 

11 input from some people. Some people put a lot more 

12 time in it than others. No one gets paid anything. 

13 The people that are getting paid are the staff people 

14 and consultants, so the people at the bottom of the 

15 list there, so Miron Straf is a study director, and 

16 research associate and staff associates are people 

17 that are actually paid to do these things, 

18 consultants are paid. See, it's hard to know how 

19 much time different people put in. In some panels 

20 they have I've been on that have a list like this, 

21 one or two people do all the writing and the staff 

22 associates put it all together and in other cases 

23 people, everybody participates. 

24 Q. Does the fact that Stephen Fienberg is the 

25 editor of this publication indicate that he had 
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1 substantial involvement in it? 

2 A. I would suspect he did. 

3 Q. Okay. And I understand your — you are a 

4 statistician and expert in statistics and not in the 

5 law. 

6 A. Correct. 

7 Q. And my question then is: Given your knowledge 

8 of the people, in particular Stephen Fienberg as the 

9 editor and your general knowledge of the workings of 

10 the Committee on National Statistics, any review of 

11 this particular book over the weekend, is it fair to 

12 say it's a reliable authority on the statistical 

13 discussions that it sets forth? Maybe you can't 

14 opine on the application of those statistical 

15 discussions for the law but as far as the statistical 

16 discussions themselves go? 

17 A. It's probably very reliable with respect to what 

18 the current thinking is on that. I guess the reason 

19 I'm hesitating there, I haven't read it, I don't know 

20 whether I would agree with some of the statements and 

21 uses, but it's probably very reliable on what current 

22 thinking is and what the current state of these 

23 things are. I suspect it is very reliable. 

24 Q. And if you look at any — even a textbook on 

25 statistics that status — professors of statistics 
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1 would often use and would say is, you know, 

2 reasonably reliable, there may be parts or sections 

3 or even a single page or sentence in that book that 

4 you in particular or another statistician might 

5 disagree with; correct? 

6 A. Correct. 

7 Q. But you would still consider the work in general 

8 to be a reasonably reliable authority? 

9 A. Yes. 

10 Q. And the same thing is true for this book? 

11 A. I assume so. I haven't read through it. 

12 Q. The whole book. 

13 A. But based on the people involved, many of these 

14 people, Paul Meier has been involved in statistics 

15 and law for many, many years. He is in his 70s. 

16 Yeah, Sandy's involved in these things. Yeah, I 

17 assume it is but I — but I can't say more than that 

18 with respect to having read it or agreed with things. 

19 Q. Sure. You said there were certain portions of 

20 the book that you tried to look at over the weekend. 

21 Would you identify that for me? 

22 A. I may need some help but it was a section on 

23 confidence intervals and point estimates that was — 

24 Q. I believe there is an index in the back if this 

25 helps you. I don't know if you used the index over 
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1 the weekend to find things or — 


2 

MR. 

BIERSTEKER: 

I 

can probably help out. 

3 

MR. 

LOVE: That 

would be fine. 

4 

MR. 

BIERSTEKER: 

I 

think it's pages 114 to 


5 115, approximately, which is the portion of the book 

6 about which Dr. Wyant testified in his testimony. 

7 Q. Can you turn to those pages. Professor Rubin? 

8 A. Okay. Okay, I'm on page 114 in the exhibit. 

9 Yes, that's what I — I think I started on that and 

10 glanced through that part and a little bit further 

11 on. 

12 Q. And that's — 

13 MR. BIERSTEKER: John, I just noticed there 

14 is some highlighting on the copy you gave to the 

15 Professor. 

16 MR. LOVE: Oh. 

17 MR. BIERSTEKER: I don't know if that's 

18 inadvertent. I wanted to note for the record it was 

19 already there before we got it. 

20 MR. LOVE: Sure. We can certainly provide 

21 a Xerox copy. 

22 MR. BIERSTEKER: I don't care. I just 

23 wanted to note that it was there before it came. 

24 MR. LOVE: That's fair. 

25 BY MR. LOVE: 
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1 Q. So starting on page 114, there is a section 

2 called "3.3.5 Damages"? 

3 A. Right. 

4 Q. So you reviewed that section? 

5 A. I reviewed and thought about the paragraph at 

6 the bottom of page 114. 

7 Q. Okay. 

8 A. And I read quickly through the rest. 

9 Q. The rest being — 

10 A. Through the end of the chapter. It was just a 

11 quick read. 

12 Q. Just tell me where you stopped reading. 

13 A. One seventeen, I believe. Yes. 

14 Q. Did you read the summary that's on page 117? 

15 A. Quickly. If you ask questions about it, maybe I 

16 could read it begin. 

17 Q. I just wanted to find out where you started and 

18 where you stopped. 

19 A. That's basically what it was. 

20 Q. All right. Did you come to any expert opinions 

21 about what's set forth in those pages? 

22 MR. BIERSTEKER: Object to the form. 

23 A. I don't know if they would be called expert 

24 because I am not an expert in interface of law and 

25 statistics. I thought about this issue of point 
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1 estimates and the role confidence intervals play in 

2 damages, and whether I agreed with the — with the 

3 drift of it, there are some things I don't — I don't 

4 agree with that they were saying. 

5 Q. Let me try to ask a more clear question, then. 

6 A. Okay. 

7 Q. Because you said you weren't an expert in the 

8 application of legal rules to these topics. 

9 A. Correct. 

10 Q. In reviewing these pages, did you come to any 

11 conclusions of your own regarding what's set forth 

12 there? 

13 A. Conclusions of my own concerning — 

14 Q. Just you said you reviewed this and did you come 

15 to any conclusions whether you agreed with parts or 

16 disagreed with parts or don't understand parts? 

17 MR. BIERSTEKER: I'll object to the form. 

18 A. Well there are some parts that I said — there 

19 is a sentence I don't understand, don't agree with at 

20 all, the last sentence on 114, goes to 115. It seems 

21 to be an odd way of saying. I didn't understand 

22 that, but that's a minor point. 

23 Q. Okay. 

24 A. A more general issue is this idea of what role, 

25 I believe what role confidence intervals should play 
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1 when they are large, when assessing damages, and kind 

2 of thinking of that, what was going on in my mind is, 

3 if the whole analysis is — I can understand this if 

4 it's an individual situation, individual case, I 

5 guess, where you try to make projections into the 

6 future. There is always some uncertainty about 

7 what's going to happen. But you have sort of the 

8 full set of data in front of you that can be obtained 

9 or can be reasonably expected to be obtained because 

10 you don't have data in the future. Someone has been 

11 damaged, we don't know what his life history would be 

12 like in the future necessarily without it, so we have 

13 to make assessments of that kind. I can understand 

14 at that point the idea of confidence intervals 

15 relative to point estimates, that — that confidence 

16 intervals may be relatively — viewed as relatively 

17 unimportant, relative to the point estimates, 

18 although at the same time my thinking was there is 

19 another kind of uncertainty which has to do with 

20 model uncertainty. You don't know how to do the 

21 projection in time, that some people think of as 

22 being within confidence intervals that are not. 

23 Confidence intervals arise because of sample 

24 variability. This is focused on sample variability 

25 and if that's the best data you have, I can 
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1 understand where that's true. 

2 In another situation where the whole analysis is 

3 based on surveys, population level, where you could 

4 do a better job, for example taking this case where 

5 you could do a better job by going out and getting 

6 some information, for example from Minnesota people, 

7 and getting the right information to drive those 

8 confidence intervals down, I don't understand why 

9 this argument really should hold. If it's the best 

10 data you can get, then I understand needing a point 

11 estimate being adequate. The confidence interval 

12 will be what it will be. Why that should apply to a 

13 situation where you could get better data, you could 

14 get more focused data and there is huge amounts of 

15 money involved, I don't quite see why the fact that 

16 there is a gigantic confidence interval should say 

17 that's all you have to do. You don't have to do 

18 anymore despite the fact the confidence level is 

19 gigantic. Don't you have an obligation to collect 

20 better data because its available? So I sort of 

21 disagreed with that but that's in context. I don't 

22 know — The title of the chapter is on — what is it 

23 on? I don't know if it even applies to the case. 

24 That's my legal naivete. Maybe they are focused 

25 already on this antitrust stuff. That's the kind of 
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1 thinking that took place when I went through that. 

2 Q. Okay. If there were some reason why you could 

3 not go out and survey the Minnesota claims 

4 population, either a legal obstacle or any other 

5 obstacle that would just prevent that from happening, 

6 would that change your thinking as you just described 

7 to me? 

8 A. To the extent it becomes — you have done the 

9 best job possible, then I think these arguments seem 

10 to intellectually have more force to them. In this 

11 case, there is — in the Minnesota situation, there 

12 is so much extrapolation taking place from existing 

13 data sets which appears to be — potentially be 

14 unnecessary, at least without trying to get more 

15 relevant data, that I don't see it being in that 

16 state yet. But to the extent that it were true, that 

17 it's impossible to get better information to try to 

18 try to drive the confidence intervals down so we have 

19 some belief that what we are what we are estimating 

20 is fairly accurate, if it's impossible to do better, 

21 then that that carries weight. That carries weight 

22 also for the sensitivity, differ assumptions, 

23 different modeling assumptions. If you can't do a 

24 better job, then I guess you have to live with what 

25 you have got. But to the extent that it's an 
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1 important issue, you should explore carefully to the 


2 extent possible that you can get better answers and 

3 do better analyses than just off-the-top-of-the-head 

4 analyses. Not off the top of the hat. Bad 

5 description. 

6 (Interruption by the reporter.) 

7 A. I was trying to adjust my answer to use a word I 

8 didn't want to use. 

9 Q. You didn't mean "off the top of the hat"? 

10 A. No. What I meant was, analyses that are not 

11 carefully done in the sense of being attuned to 

12 issues done by the propensity score analyses that I 

13 did, the fact that you can't rely on the answers that 

14 are being produced for reasons that are evident in 

15 the data themselves. 

16 What I was thinking about when I said "off the 

17 top of the hat," just to be clear, was this statement 

18 somewhere in here that regression analyses are 

19 commonly done and are therefore the method of often 

20 of choice. 

21 Q. When you say "somewhere in here," you mean 

22 somewhere — 

23 A. In the book. 

24 Q. Fienberg book? 

25 A. Fienberg book. 
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1 Q. Trial exhibit — 

2 A. There is a sentence somewhere that says that — 

3 I don't know where it is now — that very often the 

4 regression analyses are used and that would — not 

5 off the top of the hat, is a bad description for 

6 that. An off-the-shelf model without really thinking 

7 through what the consequences are. There are big 

8 damages involved, it would seem to me appropriate to 

9 think carefully about what an appropriate analysis 

10 model data combination, appropriate combination that 

11 addresses the real issue and not just use a 

12 regression analysis because it was — it was used in 

13 this way one time before. 

14 Q. Professor Rubin, I think we probably should 

15 break for lunch soon. Before we do, I want to, sort 

16 of helping me understand your propensity score 

17 analysis and truly from a layman's point of view, I 

18 was going to ask if we could after lunch, if you 

19 could sort of walk me through a simple example and I 

20 can tell you before lunch and maybe it helps you to 

21 think about it. 

22 A. Sure. 

23 Q. Maybe it doesn't. 

24 Here's my simple example I'd like to try to walk 

25 through and understand how the propensity scores are 
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calculated and how the comparison at least of the 
means between two variables are calculated and that 
your first test, your first test, I think, in your 
expert report, whether the means are a certain amount 
of standard deviation apart. And in my example, we 
just have two variables, smoker, yes or no, and 
gender, either male or female. I want to try to set 
up what I think is called a two-by-two table where 
the smoker variable is either zero if you are not a 
smoker or one if you are a smoker and the gender 
variables are the zero if you are a male or one if 
you are a female, and that in this population sample 
we have got 100 male nonsmokers, 200 male smokers, 

200 female nonsmokers and 100 female smokers. Does 
that make sense to you? 

A. Yes. 

Q. So after lunch, if we could try to walk through 
that. If taking that helps you at all, you are more 
than welcome to take it. 

A. Sure. 

Q. Then let's break for lunch. 

(Luncheon recess taken at approximately 

12:27 p.m.) 
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1 AFTERNOON SESSION 

2 (Deposition reconvened at approximately 

3 1:39p.m.) 

4 BY MR. LOVE: 

5 Q. Professor Rubin, I showed you this two-by-two 

6 table that I put together just before lunch and told 

7 you I'd like to have you walk me through this little 

8 example so I understand how propensity scores are 

9 calculated and how you go about making at least the 

10 first comparison, the comparison of the means of the 

11 two different variables, I guess you would call 

12 them. 

13 A. Okay. 

14 Q. Two different groups. So can you sort of do 

15 that and walk me through? If you need a calculator, 

16 I'll be happy to get it for you. 

17 A. No, I don't think there is any need. 

18 This particular case does not reveal much at all 

19 about propensity scores because it's so simple. In 

20 fact, the idea of propensity scores is to try to take 

21 a complicated situation with many background 

22 variables, here is just one gender, and create a 

23 display that's as simple and obvious as this 

24 display. So in this particular case, for propensity 

25 scores, I couldn't do anything other than what you 
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1 have already done. The smoker is the outcome in this 

2 case in the sense that it's the exposure, yes/no, as 

3 you have indicated, or in some other kind of jargon, 

4 be the treated and untreated or treated and control 

5 groups. I may slip into using that jargon instead. 

6 So, the no smokers would be control and the yes 

7 smokers would be treated. 

8 Q. And at times when you use X and Y variables I 

9 think in your supplementary report, would that smoker 

10 be the Y variable? 

11 A. Not really. Smoker would be the treatment 

12 variable and Y would be like dollars. 

13 Q. Okay. 

14 A. Because the outcome you really care about and 

15 you try to estimate in some sense the effect of 

16 smoking on Y, on dollars, controlling for X gender 

17 and other things. So in this case that you have 

18 drawn, the only variable that we are trying to adjust 

19 for is gender. We are trying to compare like with 

20 like with respect to gender, which means we are 

21 trying to compare male smokers with male nonsmokers 

22 and female smokers with female nonsmokers, so in this 

23 case the propensity score formally, since there is 

24 only one X, is identical to that X, so X is gender, 

25 so in this example the propensity score is 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



593 


1 male/female. 

2 Q. Does it have a value which is male/female? 

3 A. Well female you wrote down as one and male you 

4 wrote down as zero so in this case the propensity 

5 score is — is the zero one variable. The 

6 probability, it's the probability — okay. If you 

7 are — if you are male, the probability of being a 

8 smoker is two-thirds, if you are female the 

9 probability of being a smoker is one-third, so that 

10 would be the propensity score. 

11 Q. Okay. So two-thirds for males and one-third for 

12 females? 

13 A. Yeah, is the smoker/nonsmoker. 

14 Q. Okay. And then as I understand it, you do 

15 something to figure out whether — 

16 And that would be a mean? 

17 A. That's right. Yeah. So the — the average 

18 value — all the males that propensity score 

19 two-thirds and all the females have a propensity 

20 score of one-third. 

21 Q. Okay. Then you do something to compare the — 

22 those two means. 

23 A. Well basically those two distributions. 

24 Q. All right. 

25 A. Which in this case are the kind of — kind of 


STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



594 


1 trivial because you look at the average value for the 

2 propensity score for males — sorry — for smokers 

3 and look at the average value of the propensity score 

4 for nonsmokers and see how far apart they are. More 

5 generally, you have, instead of just a zero one 

6 variable, you have a whole collection of continuous, 

7 like age, you would have lots of ages represented so 

8 there would be a distribution, but what you are 

9 looking at is the distribution of gender for smokers 

10 and comparing that to the distribution of gender for 

11 nonsmokers. So if you look at a histogram 

12 representing distributions, would you like to do 

13 that? 

14 Q. Yeah, that would help me picture it. 

15 A. Okay. So for smokers, here is zero and here is 

16 one, the two — two values for the propensity score, 

17 and the histogram would go up one-third on zero and 

18 two-thirds on one because they are females and then 

19 for non — I did nonsmokers. Sorry. Nonsmokers 

20 would be one-third and two-thirds and for smokers — 

21 I'll do it down here. Smokers, it would be 

22 two-thirds and one-third. The histogram is — 

23 In this case where there is only one variable, 

24 this is the histogram for nonsmokers, there is a 

25 histogram for smokers, and so that corresponds to 
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1 these histograms on Trial Exhibit 2278, where instead 

2 of just having two values, zero and one, there are 

3 lots of possible values for the X variable, for the 

4 background variable. 

5 (Interruption by the reporter.) 

6 A. So here there are just two, so we are looking 

7 like that, so this is what? The top one is 

8 histograms for nonsmokers, so for the — since only 

9 two possible values instead of however. There are a 

10 dozen different values here. You just have two — a 

11 third, two-thirds and for the smokers you would have 

12 two-thirds, one-third, the two values. 


13 

Q. 

Right. 


14 

A. 

Is that clear? 


15 

Q. 

Well that's what I was thinking. 

If you had a 


16 histogram that looked like Trial Exhibit 2278, you 

17 would have some — some bar like you drew here — 

18 A. Correct. 

19 Q. — at one-third and some bar at two-thirds for 

20 smokers and another size bar at one-third, another 

21 size bar at two-thirds for the nonsmokers. 

22 A. See if I'm saying this right. For the — Yeah, 

23 because — yeah, the way — I'm sorry. That's 

24 probably a better way. Because in this case it 

25 doesn't make any difference how you indicate the 
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1 variable. The problem — the specific probabilities 

2 for these — for these people who are male for being 

3 a nonsmoker is one-third and two-thirds. 

4 Q. Okay. 

5 A. And down here it's one-third and two-thirds. 

6 Let me look to see whether I have probabilities here, 

7 or it's called linear propensity scores, to make it 

8 clear. These are actual probability, so these are 

9 propensity scores. 

10 In the supplemental report I talked about how — 

11 that you can actually do the plots in the probability 

12 sale or you can do it in what's called the linear 

13 scale, which are the variables that went into them, 

14 and for reasons I described in the supplemental 

15 report, they are usually done in terms of linear 

16 scores, not the probabilities per se. But in this 

17 case it makes absolutely no difference being what the 

18 picture is. 

19 Q. The picture looks pretty much the same? 

20 A. Is the same. The males are coded zero and they 

21 all have a probability of one-third being nonsmoker 

22 and two-thirds being a smoker, for the females it's 

23 two-thirds. 

24 Q. So if we do this sort of revised chart, we have 

25 the bars at one-third and two-thirds, is that the 
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1 probability scale? 

2 A. Yes, that would be the probability, because the 

3 — among the nonsmokers, all the males have a 

4 probability. Let me try to be — probably being — 

5 You know what I'm doing? I'm switching around. The 

6 — the probability for males of being a smoker, for 

7 all the males the probability is two-thirds. 

8 Q. That makes sense to me. 

9 A. Right. And the probability being a smoker for 

10 female is one-third. 

11 Q. And that's what you have shown with these bars 

12 here? 

13 A. Right. 

14 Q. As I understand it. 

15 A. Right. 

16 Q. Okay. And then so that's the histogram that at 

17 least shows something like what we have seen? 

18 A. Right. 

19 Q. I can compare it to what Raghunathan had done in 

20 this Trial Exhibit 2278. 

21 A. Correct. 

22 Q. And then — and then you went on, as I 

23 understand it, in the supplemental report and you get 

24 what's called the mean propensity score? 

25 A. Right, the average value. 
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1 Q. So can you show me how you do that in this 

2 example? 

3 A. Okay. The — the way I was thinking about it at 

4 lunch, I wasn't doing the probability scale because 

5 in the supplemental report I — I do it in the linear 

6 form instead, so to make it more consistent with that 

7 I was just — the way I was thinking about doing it 

8 is just having the zero one rather than the 

9 probabilities themselves. 

10 Q. It will help me to understand it because I don't 

11 — I sort of had some conception of what these 

12 histograms looked like, I thought, from example 

13 2278. If you could keep that form — 

14 A. I'll try to move more with the supplemental 

15 report notation. 

16 Q. The methodology will be similar, just a 

17 different scale or something? 

18 A. Correct. Yeah, correct. I was thinking about 

19 the other way. If I can take a minute to think about 

20 it this way instead of correcting it on the — on the 

21 fly. 

22 Q. Sure. 

23 A. Maybe the way I wrote it here isn't quite 

24 right. Let me make sure it's right. 

25 Okay. I'm just mumbling to myself now. 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



599 


1 MR. LOVE: We can go off the record. 

2 THE WITNESS: Yeah, let's go off the 

3 record. 

4 (Discussion off the record.) 

5 BY MR. LOVE: 

6 Q. So now you are going to show me how to get the 

7 mean propensity score? 

8 A. Yeah, mean propensity score. Okay. And this I 

9 had backwards, too, previously these bars I had 

10 backwards, so the one on the right is — is now 

11 correct. I was reversing the way we coded male and 

12 female. 

13 Q. Okay. 

14 A. Among the nonsmokers, a third of the people have 

15 — among the nonsmokers — 

16 Q. Uh-huh. 

17 A. — there are — there are two-thirds of the 

18 people are females and have propensity score 

19 one-third, so the probability, if you are a female, 

20 the probability of you are a smoking is one-third, 

21 the probability that you are a smoker as male is 


22 

two- 

-thirds, so 

the 

two values of the propensity score 

23 

are 

one-third 

and 

two-thirds. 

24 

Q. 

Okay. 



25 

A. 

Okay? 
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1 Q. Does it make any difference which is on the left 

2 and which is on the right? 

3 A. The only reason I did it, the value, these are 

4 values of propensity scores so they go from low to 

5 high. 

6 Q. Okay. 

7 A. Just as do here, go from low to high. That's 

8 what I was reversing before and managed to completely 

9 confuse myself in so doing. It was not intention but 

10 happens. 

11 Q. Okay. 

12 A. So this is the probability of being a smoker, is 

13 either one-third or two-thirds. 

14 Q. Okay. 

15 A. One-third if you are female, two-thirds if you 

16 are a male. Now among the people who are nonsmokers, 

17 two-thirds of them are females. 

18 Q. Yes. 

19 A. So this is — Here is a distribution. So among 

20 the nonsmokers, there are two-thirds that are 

21 females. 

22 Q. Yeah. 

23 A. And one-third are males. The thing that's a 

24 little confusing is, two-thirds have value one-third 

25 and one-third have value two-thirds. 
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1 Q. Right. 

2 A. Okay? Now if you go among the smokers, the 

3 females that are with the ones with value one-third 

4 and the male are the ones with value two-thirds. All 

5 right. That's what — that's males — females have 

6 probably one-third propensity score of being a 

7 smoker, males are a probability of two-thirds being a 

8 smoker. Among the smokers, what proportion of people 

9 have value one-third? That's the females. I don't 

10 think the smokers, one-third of people are smokers, 

11 so the — the proportion of — among the smokers — 

12 what did I say? I don't know if I said that last 

13 sentence correctly. What I meant to say was, among 

14 the smokers, one-third of the people are females, one 

15 — therefore one-third of them have propensity score 

16 of one-third, and that's why this is one-third, this 

17 height of this bar, histogram, is one-third. Among 

18 the smokers, two-thirds of the smokers are male; 

19 therefore, among the smokers, two-thirds of the 

20 people have propensity score of two-thirds, and 

21 that's the histogram here for — on the smoking group 

22 for males has — rises two-thirds above the value 

23 two-thirds. 

24 Q. Right, as high as the — 

25 A. The females, where the reverse is true for the 
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1 — among the nonsmokers. 

2 Q. Okay. 

3 A. So this shows the difference in distribution and 

4 the propensity scores between nonsmokers and smokers. 

5 Q. And these are the bar charts up on the right 

6 side of the page? 

7 A. Correct. These two charts, the nonsmoker one 

8 corresponds to nonsmoker one in this trial exhibit, 

9 and the smoking one down here which has two bars 

10 corresponds so this one for smokers. I think that's 

11 — this is a histogram for nonsmokers, histogram of 

12 smokers; correct. 

13 Q. Okay. 

14 A. Now there is a mean value for this thing called 

15 one-third and two-thirds, in this — among 

16 nonsmokers, and there is a mean value for among the 

17 smokers, "mean" meaning the same as average. 

18 Q. Uh-huh. 

19 A. I don't think the nonsmokers, if I didn't screw 

20 this up, the mean value for this thing, 

21 one-third/two-thirds, is four-ninths, I believe. 

22 Q. Is that — 

23 A. I'll tell you how I got that. A third of the 

24 people have a value one-third, one-third times 

25 one-third. 
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1 Q. One-third times one-third. 

2 A. And two-thirds of the people — I'm sorry, I 

3 didn't do that right. 

4 Q. Okay. 

5 A. Two-thirds times one-third, two-thirds of the 

6 people had value one-third. 

7 Q. So that's two-ninths? 

8 A. Correct. 

9 Q. Two-ninths. 

10 A. Correct. Two-thirds of the people had value 

11 one-third. That's two-thirds times one-third when 

12 added together — 

13 Q. That's two-ninths again; right? 

14 A. Right, so two-ninths plus two-ninths is 

15 four-ninths, so that's the mean value of the 

16 propensity score of the nonsmokers. 

17 Q. Of the nonsmokers. 

18 A. Correct. 

19 Q. Okay. 

20 A. Now let's go to the smokers. One-third of the 

21 people have value one-third. That's one-third times 

22 one-third, is one-ninth, and two-thirds of the people 

23 have value two-thirds. 

24 Q. So that's four-ninths? 

25 A. Correct. One-ninth plus four-ninths is 
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1 five-ninths. 

2 Q. So that would be the mean propensity score for 

3 the smokers? 

4 A. Correct. 

5 Q. Smokers, okay. And then in the first, I think, 

6 bench mark or test that you used you compared the 

7 difference in those means in some way? 

8 A. That's right, the difference of the means, but 

9 standardized by the number of standard deviations 

10 because we could have — Let me slow down here a 

11 second. In this display where it's 2278 — 

12 Q. This is trial Exhibit 2278. 

13 A. Right. I'm working on probability scale, and 

14 when I see this standard deviations away — 

15 Q. SD Away? 

16 A. Right. What I do is, I take the difference in 

17 those two means, which is one-ninth, and I divide it 

18 by the — an average within group standard deviation. 

19 Q. Right. 

20 A. To say how — to show how far apart the scores 

21 are in a — normed by within group standard 

22 deviation. 

23 Q. Okay. 

24 A. With propensity scores themselves, that's not a 

25 critical thing to do, but it's critical when you are 
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1 looking at — through the linear form of the scores, 

2 which is the thing I do in the supplemental report, 

3 which is actually more relevant. Suppose I measured 

4 — I called male zero and females 47,000, I get a 

5 completely different answer in terms of how far apart 

6 they are in means, just as I— coding it zero one, 

7 they call it zero and 47,000, so you have to divide 

8 by some common scale factor. And so what I then do 

9 is look at how many standard deviations apart the 

10 groups are. That's the standard deviation away. And 

11 in this case, if I did the calculations right, the 

12 number standard deviations apart is one over the 

13 square root of two, or about point seven. 

14 Q. Okay. 

15 A. That's what I was checking, calculating both 

16 ways. It's a trivial calculation but I wanted to 

17 make sure. 

18 So in this particular picture, you would have 

19 the number standard deviations between these two 

20 groups, would be point seven. 

21 Q. Okay. The way you did that was taking the 

22 five-ninths and subtracting the four-ninths for the 

23 numerator? 

24 A. Correct. 

25 Q. And that's one-ninth for the numerator? 
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1 A. Correct. 

2 Q. You are going to take that one-ninth and divide 

3 it by the standard deviation? 

4 A. Within a group. 

5 Q. Within any particular group or — 

6 A. Well in this case they are the same because of 

7 the — of the symmetry. 

8 Q. Okay. 

9 A. And in general, though, in this output, this is 

10 defined to be the average variance in the two groups 

11 divide — so the average is the variance here plus 

12 the variance there divided by two and then take the 

13 square root. 

14 Q. So in a more complicated example, you take the 

15 two variances, divide them by two and take a square 

16 root? 

17 A. Correct. 

18 Q. But in this case since they are the same you can 

19 use one? 

20 A. Yes, easy thing to do unless I blundered again 

21 in some way, but that's what I was trying to go on 

22 there. 

23 Q. What you get when you have done that calculation 

24 is, you divide it by the standard deviation and the 

25 final result of one-ninth divided by the standard 
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1 deviation is about point seven? 

2 A. Point seven, I think, is the square root of two 

3 over two, I think is point seven, seven oh seven. 

4 Q. Okay. 

5 A. Now, so in this case what it is showing you is, 

6 there is a difference between the two groups. They 

7 happen to have the same standard deviation and they 

8 have the same sample size, and clearly what you would 

9 like to do in comparing like with like in this 

10 example is compare some outcome like expenditures or 

11 disease rates for the males and separately for the 

12 females, and that would be adjusting for sex. And 

13 because the distributions overlap in this case, you 

14 can do that, there are males, both males and females 

15 — sorry — both smokers and nonsmokers among males 

16 and smokers and nonsmokers among females. It's 

17 completely obvious from this example because it's so 

18 sample to see, because there is two levels for this 

19 background variable that you are drying to control 

20 for. 

21 Q. But this shows us how to get the — the 

22 standard-deviations-away figure for this little 

23 example? 

24 A. For this little example, that's correct. But 

25 there is no payoff here calculating the propensity 
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1 scores, it's probabilities. There is no payoff to 

2 doing that because you already know there are only 

3 two levels of the background variables you want to 

4 adjust for, and so it's completely obvious how far 

5 apart these groups are and whether or not you can 

6 adjust. So there is no payoff in doing the version 

7 of calculation we did, just look at it and we can — 

8 we can calculate it if we want to but there is no 

9 benefit. I mean in a statistical sense. There may 

10 be benefit in understanding. I'm not trying to 

11 quibble. You would still end up within the males 

12 comparing — you still end up within the males 

13 comparing smokers and nonsmokers and females compares 

14 smokers and nonsmokers and no need to do the 

15 propensity score analysis for any statistical 

16 purpose, although I perfectly understand why you want 

17 to do it here, for understanding. 

18 MR. LOVE: Let's mark this as an exhibit so 

19 I know what we are talking about. 

20 THE WITNESS: Shall I sign it? 

21 (Plaintiffs' Deposition Exhibit 3551 was 

22 marked for identification.) 

23 MR. BIERSTEKER: Do you think you will 

24 understand this more than a week from now? In which 

25 case, we could clean it up. 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



609 


1 THE WITNESS: Let me reinforce that. If 

2 you would like me to try and write out all the pieces 

3 to that to make clear what's going on in that 

4 example, I would be pleased to do so. 

5 BY MR. LOVE: 

6 Q. That might help when you have the time to do 

7 that but for now, just so we get — I'll show what 

8 you we have marked as Exhibit 3551, which is the 

9 two-by-two chart that I wrote down and showed you 

10 before lunch and you have been telling me about 

11 propensity score calculations on; right? 

12 A. Correct. 

13 Q. What we see at the top is the two-by-two table; 

14 correct? 

15 A. Yes. 

16 Q. Over on the left side, below that, are these 

17 histograms — 

18 A. Right side. 

19 Q. On the right side of the paper is the histograms 

20 that you drew, and when you went through and did the 

21 final calculation on this one you have a mean 

22 difference of point — roughly point seven standard 

23 deviation? 

24 A. Correct. 

25 Q. Professor Rubin, I'm just going to mark as an 
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1 exhibit I believe the four court orders we had 

2 designated and are listed in the pre-designation. 

3 You mentioned you had looked at those briefly, at 

4 least, over the weekend; right? 

5 A. Correct. 

6 (Plaintiffs' Deposition Exhibits 3552 

7 through 3555 were marked for 

8 identification.) 

9 BY MR. LOVE: 

10 Q. Professor Rubin, I'll show you what we have had 

11 mark as Exhibits 3552, 3553, 3554 and 3555, which are 

12 the four court orders that we had pre-designated for 

13 your deposition, ask if you recall seeing those over 

14 the weekend. 

15 A. More or less, that's right. There is much 

16 that's unfamiliar — 

17 Q. Sure. 

18 A. — jargon and methods, but I certainly do 

19 believe these are the ones that I glanced at. 

20 Q. Okay. Let's go back to the first one. Exhibit 

21 3552. Do you understand. Professor Rubin, that in 

22 this case the court has ruled that in measuring 

23 plaintiffs' damages that the parties must not factor 

24 in the mortality of smokers and nonsmokers? 

25 A. In the sense that it's described it here, I 
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1 believe, written the defendants and — 

2 (Interruption by the reporter.) 

3 A. In the sense that where it says here "WHEREAS, 

4 Defendants concede and Plaintiffs agree that the 

5 tobacco industry should not benefit from the 

6 premature deaths of smokers." 

7 Q. Okay. If you look down two whereas paragraphs 

8 from that, do you see where it says "WHEREAS, 

9 Defendants claim that they must factor in the 

10 mortality of smokers and nonsmokers in order to 

11 challenge Plaintiffs' damages model"? 

12 A. Yes, I see that. 

13 Q. All right. And do you understand that the 

14 ultimate ruling of the court is that factoring in 

15 those relative mortality of smokers and nonsmokers 

16 must not be done in measuring damages in this case? 

17 A. I think I understand what that — what that 

18 means and I thought about what that means and, yes, I 

19 do believe I understand it. 

20 Q. Can you tell me what that means to you? 

21 A. Well that is, if I were to think about a person 

22 in the current world who is a smoker and what he 

23 would be like in either a world without smoking or in 

24 a world without alleged misconduct, and that if you 

25 — now you track through time seeing what that 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



612 


1 person's healthcare costs would be under those two 

2 worlds, that if in fact under those two worlds, one 

3 real and one hypothetical, the smoker dies in the 

4 real world early and whereas he would have lived 

5 longer under the world without smoking either because 

6 of misconduct or otherwise, that no costs after that 

7 that the state would save can be brought in. 

8 Q. So that's your understanding of the — 

9 A. Correct. 

10 Q. — rule for damages in this particular case? 

11 A. Pardon? 

12 Q. That's your understanding of the court's rule 

13 for damages in this particular case? 

14 A. Correct, correct, because it says you cannot 

15 take into account anything that would have the — a 

16 premature, I think is the word, death of smoker, what 

17 I think of that as being death before he would have 

18 died in the absence of smoking, that after that event 

19 takes place, those costs of his being without smoking 

20 cannot be used to offset any other costs. 

21 Q. And the next three orders, three exhibits, 

22 Exhibits 3553, 54 and 55, you understand that the 

23 court has determined that in this case that 

24 smoking-attributable expenditures constitute an 

25 indivisible injury that cannot be apportioned into 
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1 how much of those expenditures would have been 

2 incurred in a world without defendants' misconduct, 

3 for instance? 

4 MR. BIERSTEKER: I object to the question 

5 as mischaracterizing what the court has found. 

6 A. That wasn't my understanding. 

7 Q. What is your understanding? 

8 A. I'm not precisely sure what it is, — 

9 Q. Okay. 

10 A. — frankly. The fact is, when I think about it, 

11 what it might mean, is everybody does get sick and 

12 die at some point, and so there are expenses that 

13 people incur. I'm not — I'm not exactly sure what 

14 the indivisible thing is. Perhaps I should be more 

15 aware of that but — 

16 Q. Anyway, sitting here today, you don't have an 

17 understanding of what the court's ordering in these 

18 three exhibits? 

19 A. Well I think I understand — 

20 Q. Not the first one, the last group of three, 

21 which is Exhibits 33 — sorry — 3553, 3554 and 3555? 

22 A. I thought I had — Maybe we can take them one at 

23 a time. 

24 Q. Sure. 

25 A. If you are representing to me that of the next 
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1 three all deal with that particular issue, then I 

2 guess I don't. 

3 Q. Okay. Let's take the first one, 3553. Would 

4 you look at the bottom of page 7 and the top of page 

5 8. 

6 A. Fine. 

7 Q. And read what the court has said there to 

8 yourself. By the "bottom" I mean the last whereas — 

9 A. Right, last whereas. 

10 Q. — not the footnote. 

11 A. Correct. 

12 I do not have a clear understanding what means. 

13 Certainly it can't mean that anyone who is a smoker 

14 has — all that burden, all costs go on to someone 

15 else, so I, no, I do not have a clear understanding 

16 of the meaning of that. 

17 Q. Let's look at the next exhibit, 3554. 

18 A. Yes. 

19 Q. And if you look first at the very bottom of page 

20 8 [sic. (6)] and carries over to the top of page 9 

21 [sic.(7)], where the court says, "WHEREAS, the 

22 indivisible injury rule does not require Plaintiffs 

23 to show damages are 'fairly distinct'; the burden of 

24 allocating the damages falls upon those who claim it 

25 can be disaggregated," that phrase, is that — 
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1 MR. BIERSTEKER: I don't see it on this 

2 page. 

3 MR. LOVE: We have the wrong exhibit. 

4 THE WITNESS: I have the wrong exhibit? 

5 You said page 8? 

6 MR. LOVE: I meant to say page 6 and I 

7 think over to page 7. 

8 THE WITNESS: I thought you said 8. I'm 

9 sorry. 

10 MR. LOVE: Yes. I meant page 6 carrying 

11 over to page 7. 

12 THE WITNESS: Yes. 

13 Q. And if you will also look at footnote 8 near the 

14 bottom of page 7 where the court says defendants have 

15 not offered evidence — 

16 A. Yes. 

17 Q. — do you see that? 

18 A. I see, yeah, where they misquoted me, I see 

19 that. 

20 Q. And then the final sentence of that footnote, 

21 "Plaintiffs here presented testimony that 

22 disaggregation would be impossible and not even 

23 Defendants' experts can identify a means of 

24 apportionment," do you see that? 

25 A. That's based on a misquote, they are misquoting 
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1 me, my deposition. I can read it but it's wrong. 

2 Q. And then if you just look at page 8, the next 

3 page, the third whereas clause. 

4 A. Yes. 

5 Q. Second portion of that says. Thus, impossibility 

6 of apportioning damages in cases of this magnitude 

7 cannot be used as proof of lack of causation; if this 

8 were so — if this were not so, plaintiffs in complex 

9 cases would not have any remedy for injuries suffered 

10 and defendants would be rewarded despite their 

11 culpability. Do you see that? 

12 A. Yes, I do. 

13 Q. Do you have an understanding what the court has 

14 ruled in this order as it affects calculation of 

15 damages in this case? 

16 A. I believe I have some understanding. And there 

17 are a lot of, in my world, foreign technical words 

18 being used that probably have special meaning, but my 

19 understanding from these words is that the burden of 

20 proof on showing the effect of damages due to 

21 misconduct, that burden has been shifted to the 

22 defendants, I believe, and that saying it can't be 

23 done, it's impossible to do, is no excuse for not 

24 producing — for trying to do it. That cannot be 

25 used as a defendant, in fact can't be done. I never 
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1 

said it can't be done. 




2 


Again, that's a misquotation 

of mine. 

so — 

but 

3 

it 

seems to be — it seems to be 

a reasonable 


4 

statement, the last clause that you read. 

"the 


5 

impossibility of apportioning damages cannot be 

used 

6 

as 

a proof of lack of causation." 

That' s 

okay. 


7 

Q. 

All right. 




8 

A. 

I understand that. 




9 

Q. 

Just to complete the series 

of orders 

here. 

if 

10 

we 

look at the last exhibit — 




11 

A. 

Sure. 




12 

Q. 

— 355 — lost my numbers — 




13 


MR. BIERSTEKER: 5. 




14 

Q. 

Five, 3555, and if you just 

look sort 

of in 

the 

15 

middle of the page where it says. 

"MOTION 

DENIED 

II 

16 

A. 

Yes . 




17 

Q. 

Read further on. It says, " 

NON-LIGGETT 



18 DEFENDANTS' NOTICE OF MOTION AND MOTION FOR MISTRIAL 

19 OR LEAVE TO SUPPLEMENT EXPERT REPORT." 

20 A. Yes. 

21 Q. Now do you have any understanding as to the fact 

22 whether defendants offered to produce any damages 

23 models that did disaggregate smoking-attributable 

24 healthcare expenses into those that arguably were 

25 caused by defendants' alleged misconduct and those 
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1 that arguably weren't caused by defendants' alleged 

2 misconduct? 

3 A. Yes. My understanding is that some initial work 

4 was done by someone for the defendants. 

5 Q. Did you have any involvement in that? 

6 A. No, I did not. 

7 Q. Have you reviewed that effort? 

8 A. No, I have not. 

9 Q. You haven't seen Dr. Wecker's second 

10 supplemental report — 

11 A. No, I have not. 

12 Q. — dated March something, April something? 

13 A. I believe I may have had a two-minute 

14 conversation with Peter Biersteker about it at one 

15 time but that's all. I have not seen any documents, 

16 I have not talked to Wecker. 

17 Q. Do you understand the court ruled defendants 

18 could not introduce those models into evidence? 

19 A. Yeah, although it seems kind of contradictory. 

20 I understand that means they can't, although from 

21 this it says they must, but I'm not a lawyer so I 

22 don't know what's going on. They seem to be sort of 

23 contradictory, you must do it but you can't do it, 

24 but that's not my field. 

25 Q. If in fact the court has determined that no 
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1 party may introduce any model that attempts that 

2 disaggregation, would you be able to follow that rule 

3 in this case? 

4 MR. BIERSTEKER: I object. I don't know 

5 what you are trying to do. I mean, you know, it's 

6 clear that — it's clear Professor Rubin is not a 

7 lawyer, it's clear that he has an impression because 

8 you asked him to read stuff about some of the things 

9 that he has read, but I think we are probably well 

10 beyond the scope of the expertise and the scope of 

11 the second supplemental report that he submitted. I 

12 don't know what you are trying to do but I think 

13 these questions are — this is sort of can you follow 

14 the rules if the court tells you what the rules are, 

15 and the answer to that question is going to be yes, 

16 we follow the rules once the rules are clearly 

17 spelled out and the court tells us what the rules 

18 are, but I think — I think you are pretty far 

19 afield. 

20 BY MR. LOVE: 

21 Q. Professor Rubin, let me ask you a different 

22 question, then. Having seen these court rulings, 

23 have you had any opportunity to talk with them, to 

24 get a better understanding by talking to anybody 

25 about them? I'm not asking what the conversation was 
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1 but have you had a chance to do that? 

2 A. I've had brief conversations with Peter 

3 Biersteker about the meaning of them but I certainly 

4 can't see myself being in a position where I have to 

5 make some legal assessment of whether I can answer a 

6 question or not. I will answer all questions 

7 honestly that are put to me and if a question comes 

8 up that shouldn't be asked, presumably it's up to 

9 someone else to decide whether that question should 

10 have been asked. 

11 Q. After having seen the orders and having a 

12 conversation with Mr. Biersteker, did you go back and 

13 examine any of the opinions you have expressed in 

14 your expert reports in this case to determine whether 

15 or not those orders would affect those opinions? 

16 A. Well it — not affect the opinions in general at 

17 all but in the — the issues that are addressed, 

18 yes. Shall I try to clarify that? 

19 Q. Yes, if you could. 

20 A. When I wrote these — the original report, 

21 opinions about issues of causality, there were two 

22 kinds of causal questions I brought up. One was the 

23 ever smoking one and then there was the smoking — 

24 damages due to alleged misconduct, and in one of 

25 those examples, or maybe in both, I don't remember, 
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1 in discussing those topics, I used examples, and one 

2 of those, the examples, it had this, quote, death 

3 credit going in it. It had it running where the 

4 smoker died earlier, I think if he were without 

5 smoking would be in a nursing home and cost more. 

6 Now that example still works for the issue of 

7 causality. There is no problem with the example. 

8 But the point is, the court has said I cannot 

9 consider savings in costs that occur after this 

10 person would have died as a smoker, so what I would 

11 have done or could have done or should have done when 

12 I now think about it is the way I mentioned earlier 

13 in today's deposition, that I would have decomposed 

14 these two streams in the future to not carrying them 

15 on until he died, this person died under either 

16 scenario, that he kept going as long as he is alive 

17 in one scenario. Now the issue is, I want to stop as 

18 soon as the person dies under either scenario and 

19 only look at costs, the stream of costs up until the 

20 point in which he is alive under both scenarios. So, 

21 it would — it would — it would have changed my 

22 decision on how to address that causal question by 

23 decomposing the costs into two pieces, one piece 

24 which is when this person is alive under both streams 

25 — by "streams" I mean smoking and nonsmoking 
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1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 


streams and thereafter — and the court has said 
those costs thereafter may not be used in any 
calculation. And that would work for both the ever 
smoking and the misconduct version of the causal 
question. 

Q. Is the example you are talking about the one 
that appears on page 7 of your first expert report, 
trial Exhibit 2273? 

A. Yes, that's the example I had in mind. 

So now what I would do, I would say, okay, 
here's the streams under world with smoking, here's 
the stream under world without smoking, and the court 
has decided that what I have to do is divide the 
stream into two pieces, before 1980 and after 1980, 
and the costs in the world with smoking in this 
little example after 1980 are zero and the world 
without smoking, they — they build up, and the court 
has said you have got to stop counting dollars in 
1980. You can count those if you want but you can't 
use them in this trial, they can't be used to trade 
off, so instead we have to start up with the date of 
alleged misconduct, which I believe is this example. 

I don't remember exactly, maybe it's 1955. No, 
average smoker. I'm sorry, average smoker. It's 
right. 
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1 Q. This example doesn't involve misconduct. 

2 A. Now I remember that's right. This is ever 

3 smoking, so you build up all the healthcare costs 

4 from the time he was born until the time he died with 

5 the world with smoking and build up all the costs 

6 from the time he was born until the time — in a 

7 world without smoking, because — until he would have 

8 died with smokers. You have to shut him off at 

9 1980. You can't account any of those costs after 

10 that. It's a legitimate question but the court has 

11 ruled you can't use those costs in calculating those 

12 things. I understand that. That's a ruling the 

13 court has made. 

14 So now in hindsight, that example would have 

15 been clear if I had said that then, so that there are 

16 costs before that date and costs after that date, and 

17 there may be reasons why you would like to ask two 

18 different questions, not one just total costs but one 

19 total cost up to the point of demise. 

20 (Interruption by the reporter.) 

21 A. Up to the point of demise, death. 

22 Q. If you look at the example you gave on page 3 of 

23 your first report. Trial Exhibit 2273, can you tell 

24 me if that example is affected at all in your mind by 

25 any of these court rulings we have just discussed? 
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1 A. The example isn't but it — it may be that 

2 aspects of — apparently aspects of trying to build 

3 the model to address that question apparently were 

4 not allowed. I think that was the last court order 

5 you showed me. So that the first attempt to try to 

6 do this was not allowed. I have no idea whether that 

7 means all attempts to try to answer the question 

8 that's being asked will be thrown out. That's for 

9 you guys to decide with the judge, or you guys 

10 divided with the judge. 

11 Q. Is it fair to say you don't know whether these 

12 orders affect this example on page 3 of Exhibit 2273, 

13 not the — 

14 MR. BIERSTEKER: Wait a minute. Can I ask 

15 that we segregate out the Exhibit 3552, which is the 

16 one that deals with the so-called death credit, from 

17 the wrongful conduct orders? Because I think — Well 

18 I would ask that you do that if you don't mind. 

19 Otherwise, it's ambiguous. 

20 Q. Let me ask you first. Dr. Rubin, do you believe 

21 that the example on page 3 of Trial Exhibit 2273 

22 involves the death credit as we discussed? 

23 A. No. Because there is no death in the example. 

24 MR. LOVE: Does that satisfy you, Peter? 

25 MR. BIERSTEKER: It does. 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



625 


1 Q. Now the question I'm asking is: The other three 

2 court orders that didn't deal with death credit at 

3 this time, that was the first order, but the three 

4 after we talked about, is it fair to say you don't 

5 know whether any of those three orders affect your 

6 ability to use the example here on page 3 at the 

7 trial? 

8 A. I would not — You are asking sort of a legal 

9 question. I can give you my lay interpretation of 

10 the legal question. 

11 Q. I'm just asking what you understand what you are 

12 committed to do and not committed to do at trial. 

13 A. I believe I'm still allowed to — If the 

14 question has to do with misconduct, this is a — is a 

15 way of thinking — the way of thinking about 

16 misconduct, and my lay understanding of what the 

17 court has ruled has been more of a burden of proof if 

18 you are going to do this analysis, the plaintiffs 

19 don't have to do it, and I guess secondly that the 

20 attempt to do some version of this and submit it was 

21 denied. And that's my total understanding. 

22 Q. Okay. 

23 A. I don't know why — I don't understand why it 

24 was denied, I don't know whether that denial means no 

25 further attempt can be made. I have no idea. 
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1 Q. That's why I say it's fair to say you don't know 

2 whether that ruling would extend to the example here 

3 on page 3 of Exhibit 2273? 

4 A. I guess I don't see how it could extend to that 

5 but again that's my naivete. 

6 Q. Is there anything in your second — your 

7 supplemental report, the one dated January 10 marked 

8 as Exhibit 3546, that you have gone back and thought 

9 about in light of the four court orders, either the 

10 death credit or these other three court orders? 

11 MR. BIERSTEKER: Object to the 

12 characterization "death credit." Go ahead. 

13 A. I don't think so, except that the — the second 

14 part where you talk about possible data sources for 

15 behavioral changes. 

16 Q. You are referring to section II of the report 

17 that begins on page 8? 

18 A. Section II for causal modeling which addresses 

19 behavioral changes as possible — as a possible 

20 result of alleged misconduct. If someone said you 

21 are not allowed to talk about anything about 

22 misconduct, you are not — and therefore they also 

23 said you are not allowed to talk at all about 

24 behavioral changes as a result of misconduct, then I 

25 guess those would not be germane. The propensity 
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1 score stuff is completely relevant. 

2 Q. And not affected by court order? 

3 A. Not affected by court order at all. 

4 Q. As far as you understand. 

5 A. As far as I understand, yes, that's right. 

6 Q. Have we now exhausted your — any new opinions 

7 you may have come to after reviewing the four court 

8 orders? 

9 A. As a result of reviewing the court orders, you 

10 mean? 

11 Q. Yes. 

12 A. Yes, I think we probably exhausted my knowledge 

13 of the law on such issues. 

14 Q. In that pre-designation of documents for your 

15 deposition. Exhibit 3547, if we look at item number 

16 12, Heart and Stroke Facts, it says, 1966 statistics, 

17 Statistical Supplement, in this pre-designation. I'm 

18 sure that's a typographical error, supposed to say 

19 1996. 

20 A. Gee, I read the '66 one pretty carefully. 

21 That's not true. 

22 Q. The document listed below it, CN 000046, is in 

23 fact the 1996 statistical supplement, so I'm sure you 

24 got the right one; correct? 

25 A. Probably did. I did not check the date. 
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1 (Plaintiffs' Deposition Exhibit 3556 was 

2 marked for identification.) 

3 BY MR. LOVE: 

4 Q. Professor Rubin, I'll show you what we have 

5 marked as Plaintiffs' Exhibit 3556, the American 

6 Heart Association document entitled "Heart and Stroke 

7 Facts: 1996 Statistical Supplement," and ask if 

8 that's a document that you reviewed in preparation 

9 for your deposition. 

10 A. I did glance through this, yes. 

11 Q. And then in particular if you look at page 

12 number 16 of the document. There is page numbers in 

13 the lower left corner. 

14 A. Where is page 16? Sorry. 

15 Q. Sixteen. 

16 A. Yes. 

17 Q. And if you will see the first subtitle is 

18 "Cigarette/Tobacco Smoke" in the left column. 

19 A. Yes. 

20 Q. And the third bullet point says, 

21 "Smoking-related illnesses cost the United States 

22 about $50 billion annually in medical care"? 

23 A. Yes, I see that. 

24 Q. Did you see that in preparation for today's 

25 deposition? 
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1 A. Yes. 

2 Q. Do you have any opinion about that statement? 

3 A. I find most of this sort of — It's not anything 

4 like a scientific document. I mean, just look at 

5 page 1. It says, "If a heart attack doesn't kill 

6 you, you will recover and be fine." I've noticed 

7 that. And then it says — it has other information 

8 that says at the bottom just about right after that, 

9 "About two-thirds of heart attack patients don't 

10 make a complete recovery." Now how do those two 

11 things jive? You will be fine but you don't make a 

12 complete recovery? So I don't know what — I don't 

13 know what — it's something like an advertisement so 

14 it's — I don't know what to make of it. 

15 Q. I take it you have no opinions about this 

16 document. 

17 A. I read the — I read page 1. 

18 Q. Right. 

19 A. And my reaction was, "come on," so I have no 

20 opinion other than reading and sort of saying, well, 

21 it's words. They may get their numbers from 

22 somewhere but who knows what they mean. 

23 Q. Having reviewed this document, have you come to 

24 any further opinions about your work in the Minnesota 

25 case? 
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1 A. None. 

2 Q. Item 13 — Let's skip down to item 14. The 

3 Milliman & Robertson health risks and their impact on 

4 medical costs, item 14 in the pre-designation of 

5 documents for your deposition. Had you seen, 

6 reviewed that in preparation for today? 

7 A. Yes, I did, quickly again. 

8 Q. I'll be happy to show you the document if you 

9 would like but it may save time if I just ask you did 

10 you come to any opinions about that document based on 

11 your review? 

12 A. Well it's — My memory of it is it's a very 

13 simple descriptive presentation of risk factors and 

14 how they relate to healthcare costs, and it's 

15 indicated that here is a risk factor called drinking, 

16 which is — which is always good for you, and they 

17 have other risk factors, have diet, other risk 

18 factors, cigarette smoking, and they just sort of go 

19 one by one, saying how good or bad they appear to be 

20 for you in the very simple descriptive sense. I mean 

21 I could try to point out some things if you would 

22 want to give it to me. 

23 Q. If you have opinions about this study, I'd like 

24 to know what they are, so I'd be happy to show it to 

25 you. 
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1 (Plaintiffs' Deposition Exhibit 3557 was 

2 marked for identification.) 

3 BY MR. LOVE: 

4 Q. Professor Rubin, I'll show you what we have 

5 marked as Plaintiffs' Exhibit 3557, a study by 

6 Milliman & Robertson entitled "Health Risks and Their 

7 Impact on Medical Costs." Is that the report you 

8 reviewed in preparation for your deposition? 


9 

A. 

It 

certainly 

looks like it. 

10 

Q. 

If 

you — if 

you came to any opinions about this 


11 report based on your review, please tell me about 

12 them. 

13 A. About the report. 

14 Q. If you did. 

15 A. About this report. 

16 Q. Yes. 

17 A. Opinions about this report. 

18 MR. BIERSTEKER: This is fine. I'm just 

19 going to interject and say I don't mind you asking 

20 that question but if there is a specific portion you 

21 do want to ask him about, I think in fairness you 

22 ought to do that. If you generally are interested in 

23 knowing what his reaction to it was having quickly 

24 reviewed it, I don't mind that question; but if there 

25 is a specific thing you want to ask him about, I 
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1 think you ought to do that. 

2 Go ahead. 

3 A. Okay. For various risk factors such as smoking, 

4 weight control, exercise, alcohol use, driving 

5 habits, et cetera, they — eating habits, they do a 

6 calculation of four people with low risk on this risk 

7 factor and elevated on the risk factor, having or not 

8 having it, what it does to total monthly claim costs 

9 and hospital and patient days and I guess some other 

10 services, physician services, and they show that in 

11 the sense of elevated risk, currently smokes, has 

12 higher costs. They show that weight control is 

13 probably about the same, maybe more serious for some 

14 things, less serious for others. Exercise is less 

15 related than smoking. It's good to have the risk 

16 factor of using alcohol, they have some discussion 

17 about why that's — why they don't like that, a 

18 pejorative way of saying it, but why they consider a 

19 risk factor something not to do despite the fact that 

20 the simple analysis shows it is a factor. And 

21 driving habits, they have a similar analysis. Eating 

22 habits are about like smoking roughly in this 

23 marginal sense. Stress is not good for you in this 

24 sense, mental health is mildly not good for you, and 

25 they just go through and show each of these one at 
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1 time, how it relates, I guess, it's costs and 

2 hospital stays and use of physicians, so it's — it's 

3 very — it's a course summary of how these one at a 

4 time relate to these kinds of expenditures. 

5 Q. Do you have any more detailed opinion or further 

6 opinion about the portion of report that deals with 

7 smoking as the factor? 

8 A. More detailed opinion about — 

9 Q. Than you already told. 

10 A. I don't think there is anything new in here 

11 relative to all the other material that's been 

12 presented. I don't think it adds anything new if 

13 that's what you are asking. 

14 Q. I just wanted to know do you have any specific 

15 opinions about their discussion of smoking as a 

16 factor in causing expenditures and hospital stays and 

17 so on. 

18 MR. BIERSTEKER: Object to the form of the 

19 question. 

20 A. They — they do the most simple form of analysis 

21 and this — smokers claimed to be higher than 

22 nonsmokers and so forth but it's not controlling for 

23 anything, just a very simple description. 

24 Q. Did you — You told me earlier you reviewed the 

25 trial transcript of Dr. Zeger and Dr. Wyant; correct? 
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1 
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10 
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15 

16 
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A. Yes, I did. 

Q. Do you recall either of them talking about this 
particular study in their trial testimony? 

A. I believe that Wyant referred to it. I'm not — 
I don't remember whether Zeger did or not. 

Q. I believe certainly Dr. Wyant did and we can 
leave Dr. Zeger out, at least for the moment. 

A. Okay. 

Q. Do you recall Dr. Wyant using this study and its 
results to compare to the numerical results of the 
Zeger-Wyant-Miller work in this case? 

A. I don't remember that specifically, no, I 
don't. I'm not saying it's incorrect. I have no 
specific memory of referring to this study. I do 
remember he referred to things for support for his 
position but I don't remember this study in 
particular. Actually, do I have — Perhaps. I 
really can't say whether it was this study. 

Q. Okay. Do you recall Dr. Wyant comparing the 
Blue Cross results with a Milliman & Robertson study 
of people at Honeywell? 

MR. BIERSTEKER: I think this is Chrysler. 
Q. People at Chrysler. Sorry. 

A. Not with that level of specificity I don't. 

Q. You don't recall him looking at the percentage 
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1 of extra costs due to smoking in his work compared to 

2 the percentage of extra costs due to smoking in the 

3 study we have marked as 3557? 

4 A. Again I remember the general statements where he 

5 compares things but I have no specific memory of 

6 comparing to this particular study. 

IQ. Do you have any opinion as to whether this study 

8 can be used as a point of comparison with the results 

9 obtained by Drs. Zeger, Wyant and Miller. 

10 A. It seems so. 

11 Q. At least for the Blue Cross population? 

12 A. It seems so, completely unconditional, on 

13 comparing like with like, that — which is the — 

14 what I think everyone I think agrees for getting 

15 smoking-attributable expenditures. I don't see why 

16 it lends much support at all. 

17 Q. If you have one study that compares like with 

18 like, at least to the extent that Drs. Zeger, Wyant 

19 and Miller did in their report, and another study 

20 that, as you say, doesn't make — isn't much of a 

21 comparison to like to like and yet they — if they 

22 came out with very similar results, would that mean 

23 anything to you? 

24 A. Yes. Quite possibly in this case it means when 

25 one tried to adjust for the factors, didn't do a good 
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1 job. 

2 Q. That's one possible? 

3 A. That's one possible. And because of the 

4 propensity score analyses, I suspect that's probably 

5 one of the major reasons why. 

6 Q. And any other meanings? 

7 A. Pardon? 

8 Q. Any other significance that would have to you? 

9 A. That all those background characteristics that 

10 people say should be controlled for and that in other 

11 cases people say even a bigger list should be 

12 controlled for and they are relevant, that they 

13 should be controlled for and they should make a 

14 difference because supposedly they are related, and 

15 it doesn't seem to — you still get roughly the same 

16 answer. It just doesn't lend any support, to me. 

17 Q. Do you know if in the Milliman & Robertson study 

18 they controlled for age and gender? 

19 A. They controlled for age and gender. I'd have to 

20 look at it to see whether they did. I don't believe 

21 they did but it's possible that they did do it. I 

22 would think they would control for gender but now 

23 that you mention it, it doesn't — I don't see it 

24 here. Do you want me to glance at it and see whether 

25 I can figure it out? 
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1 Q. Sure. It might help you. Professor Rubin, if 

2 you look at the bottom of page 9. 

3 A. The copy I have, I can't read the numbers. Here 

4 it is, 9. Demographic characteristics? 

5 Q. Right, near the bottom of the first — the 

6 left-hand column. 

7 A. Okay. 

8 (Discussion off the record.) 

9 A. Okay. Yes. Evidently they did adjust by gender 

10 and by some categorization of age, which at least 

11 they don't say here on the bottom of page 9 how they 

12 did the age adjustment, whether it was the 35 to 60 

13 and 65 and older or some other categorization. 

14 Perhaps you can point me to where they say that and 

15 I'll try to then understand what they did. 

16 Q. At the moment, professor, I'm not sure it says 

17 in the report itself how they adjusted for age so I 

18 can't point to anything at the moment. 

19 A. Okay. 

20 Q. Is it — Knowing there is some adjustment for 

21 age and also adjustment for gender in the Milliman & 

22 Robertson study, is one possible reason that its 

23 results are similar to Zeger, Wyant and Miller's 

24 results is that the other factors besides age and 

25 gender don't matter that much in relation to smoking 
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1 and healthcare expense? 

2 A. That's a possibility but I think it's — I think 

3 it's relatively remote based on all the discussion of 

4 the need to compare like with like and control for 

5 this — these other variables, for example in this 

6 model that Harrison I mentioned to you has 38 

7 variables there he thinks must be controlled for. 

8 Q. Is your opinion based on what other people like 

9 Harrison have done or is that based also on your own 

10 expertise? 

11 A. Well my — My opinion, relying on other people 

12 who know about how these other risk factors are 

13 related to disease and healthcare costs, I'm relying 

14 on other people for that and the fact that here we 

15 see eating habits is a bigger risk factor than 

16 smoking, et cetera, so we are relying on other people 

17 to make those kinds of statements. I'm relying on 

18 myself with respect to how one does adjustments for 

19 those kinds of factors appropriately in a situation 

20 where there are these background differences between 

21 the groups. That's like the propensity score thing 

22 to get started. You can't rely on linear models and 

23 you can have examples, not just linear models, models 

24 of the kind they are using, because it's quite 

25 possible that a good adjustment would result in a 
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1 real change of the numbers. And that's based on work 

2 that started in my Ph.D. thesis under Bill Cochran in 

3 the late '60s, so I guess that's 30 years of work on 

4 this topic that suggests it can make a substantial 

5 difference how you try to adjust for variables like 

6 that that have the different distribution between 

7 exposed and non-exposed, smokers and nonsmokers. And 

8 if you don't do a good job of that, in terms of 

9 "good" being better techniques, better looks at the 

10 data, you can get nothing happening. It's completely 

11 unreliable. 

12 Q. Is it fair to say, then, your expertise is 

13 primarily in the area of how to adjust for background 

14 variables and other variables as opposed to choosing 

15 which variables should be included? 

16 A. Absolutely, absolutely, that's fair. That's why 

17 I've said earlier with respect to the collection of 

18 variables that have been selected by the Zeger et al 

19 team, I presented propensity score analyses. The 

20 propensity score analyses accepted that collection of 

21 variables, saying if those are the variables you want 

22 to adjust for, my expertise is telling you whether 

23 you have, what are the dangers and how to do it 

24 better, how to do it well and when to be worried. My 

25 expertise is not in saying which of those variables 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



640 


1 should be selected. They did that, the Samet-Zeger 

2 team. 

3 Q. If you look down at item number 17 on your 

4 pre-designation of documents for this deposition — 

5 A. Yes. 

6 Q. — you have the Nurses Health Study. Do you 

7 recall seeing that? 

8 A. I glanced at it, yes. 

9 MR. LOVE: Let me mark that. 

10 MR. BIERSTEKER: Let's take a break. Is 

11 that okay? 

12 MR. LOVE: Sure. 

13 (Recess taken from 3:00 to 3:15 p.m.) 

14 (Plaintiffs' Deposition Exhibits 3558 

15 through 3559 were marked for 

16 identification.) 

17 BY MR. LOVE: 

18 Q. Professor Rubin, I'll show you what we have 

19 marked as Exhibit 3558 — it's an article entitled 

20 "RELEVANT AND ABSOLUTE EXCESS RISK OF CORONARY HEART 

21 DISEASE AMONG WOMEN WHO SMOKE CIGARETTES" — and ask 

22 if you saw that in preparing for your deposition 

23 today. 

24 A. I saw it, I glanced at parts of it. I did not 

25 read it all. 
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1 Q. Did you come to any conclusions about this 

2 report? 

3 A. It's another example relating ex — excess 

4 risks, in this case coronary heart disease, relating 

5 that to cigarette smoking. Nurses Health Study, I 

6 guess what it's called. I guess that's what it's 

7 called. 

8 Q. Yes, commonly called the Nurses Health Study, I 

9 believe. And you sort of described what it's about. 

10 My question is: Did you come to any conclusions 

11 about this report when you — just any conclusions at 

12 all about it, I guess. 

13 A. No conclusions that would have changed any of 

14 opinions or affected the reports that I wrote. 

15 Q. Did you come to any conclusions about whether 

16 this is something that Professor Zeger, Wyant and 

17 Miller could take into consideration in estimating 

18 damages in this case? 

19 MR. BIERSTEKER: I object to the form. 

20 A. State that again because I'm not quite sure. 

21 Q. Sure. Did you come to any conclusions about 

22 whether what's presented in this Nurses Health Study, 

23 also trial Exhibit 16,039, I guess, is something that 

24 Professor Zeger, Wyant and Miller could reasonably 

25 take into consideration in estimating damages in this 
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1 case? 

2 MR. BIERSTEKER: Object to the form again. 

3 A. It's so subjective I don't know how to answer 

4 it, whether I know whether they could reasonably take 

5 into consideration. 

6 Q. Whether you think it's something anyone trying 

7 to estimate damages in this case reasonably take into 

8 consideration. 

9 A. Well in an indirect way, yes. I mean you could 

10 look at this and see what kind of other risk factors 

11 they — they want to control for, to identification 

12 of the kinds of variables that you might want to look 

13 at just in the same way I referred to other 

14 documents, other analyses where people have said 

15 here's the list of background variables you want to 

16 adjust for in order to get a SAF or attributable 

17 expenditures, you could look at this to see what they 

18 think you should control for to try to cover this. 

19 Q. That's as far as your analysis of this report 

20 went in this case; correct? 

21 A. Correct. 

22 MR. BIERSTEKER: If you have a specific 

23 portion you would like to ask him about, you may. 

24 MR. LOVE: No. I think that's fine, 

25 Professor Rubin. 
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1 THE WITNESS: Okay. 

2 Q. I'll show you what we have marked as Plaintiffs' 

3 Exhibit 3559. It's an article titled "State 

4 Estimates of Medicaid Expenditures Attributable to 

5 Cigarette Smoking, Fiscal Year 1993," by Leonard 

6 Miller and other authors, and ask if you reviewed 

7 that in preparing for your deposition today. 

8 A. I reviewed this in something like the way I 

9 reviewed the previous one. 

10 I should have said in the previous one I didn't 

11 really read it very carefully so it could be there is 

12 some things in there that would be more revealing, 

13 but I read through it very quickly. I didn't see 

14 anything. 

15 Q. That's referring to the Nurses Health Study? 

16 A. That's right. I just wanted to clarify my 

17 answer to the last question. 

18 Q. Now we are on to Exhibit 3559. 

19 A. Correct. 

20 Q. And you also read that? 

21 A. Quickly. 

22 Q. Very quickly? 

23 A. Right. Because it's — That's right, I read it, 

24 read it quickly. 

25 Q. Did you come to any opinions about this report 
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1 that are different than the opinions you have 

2 expressed about the Zeger-Wyant-Miller reports in 

3 this case? 

4 A. Well the analysis is only very cursorily 

5 described here and it — it really didn't generate 

6 anything new and different to me than the issues that 

7 have already been discussed. 

8 Q. Are you — 

9 A. Again, if you want to refer to something in 

10 particular, I may be able to address it, but it's a 

11 version on the same thing that we have seen a lot 

12 of. 

13 Q. Are you familiar with the publication that this 

14 appears in. Public Health Reports? 

15 A. No, I'm not. 

16 Q. I take it having reviewed this exhibit quickly 

17 over the weekend you — it hasn't changed any of your 

18 opinions or given you any insight into any new 

19 opinions in the Minnesota case? 

20 A. Correct. 

21 Q. Thank you. 

22 (Discussion off the record.) 

23 Q. Professor Rubin, in your previous deposition 

24 back in October of this case, you were asked 

25 questions about the Framingham study. Do you recall 
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1 that? 

2 A. Yes. 

3 Q. And — 

4 A. Not with great specificity but I recall in 

5 general. 

6 Q. And that study was marked during your deposition 

7 and it's also marked as Trial Exhibit 2269. That's 

8 the document I'm referring to, sir. 

9 A. Yes. 

10 Q. My question today is whether since that 

11 deposition in early October 1997 you have done any 

12 further review or analysis of the Framingham study. 

13 A. No, I have not done anything. My memory is that 

14 we looked at the — that there were two issues that 

15 arose in that deposition. One was probably how to 

16 handle missing data, and they don't in a way that — 

17 my memory wasn't particularly satisfactory, nor did 

18 they do any comparison of the adjustment for the 

19 background variables between the smoking and 

20 nonsmoking groups. But that's my memory from last. 

21 This is the first I've seen it since that deposition 

22 and I haven't read it before the deposition, either, 

23 so this is just my memory from having seen it then. 

24 Q. Do you recall from reviewing the trial 

25 transcripts of Dr. Zeger, particularly Dr. Wyant, 
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1 that they. Dr. Wyant at least, used the Framingham 

2 health study as a point in comparison to the work he 

3 did in this case? 

4 A. Yes, I remember seeing that. 

5 Q. And my question is: You had not reviewed, first 

6 of all, the Framingham health study since seeing that 

7 trial transcript? 

8 A. Correct. 

9 Q. And I take it you don't have any opinions about 

10 the way in which that study was used as a comparison 

11 by Dr. Wyant? 

12 A. Well in some sense, yes, because it did not — 

13 my memory is that this paper does not do any 

14 comparison of the distributions of background 

15 variables between smokers and nonsmokers. 

16 Q. Like propensity score analysis? 

17 A. Anything to just reveal how different they would 

18 be, that's correct, like the propensity score 

19 analysis is a very efficient way of doing such a 

20 comparison, so they are relying on the models to the 

21 adjustments. I don't know from this article how far 

22 apart they are, but from the supplemental analysis 

23 that I did use in propensity scores in the 

24 supplemental report, which makes it look like there 

25 really are substantial differences between smokers 
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1 and nonsmokers and many background characteristics, 

2 so that you can't trust those kinds of models. I 

3 guess I'm now even more dubious about whether this 

4 article, for example, did in fact do a control for 

5 those — reliable control for those background 

6 variables. There is no evidence that it did, that it 

7 looked to see what problems might be there. 

8 Q. Is it your opinion that any study that compares 

9 smokers and nonsmokers first must do something like a 

10 propensity score analysis to compare the 

11 distributions between smokers and nonsmokers before 

12 making that kind of comparison? 

13 A. If the desire of the analysis is to control for 

14 these full collection of background variables, then 

15 it's imperative you look to see whether such a 

16 comparison is even feasible, and then if it's 

17 feasible, how sensitive it might be to linear 

18 modeling or log-linear modeling assumptions — 

19 (Interruption by the reporter.) 

20 A. — instead of just throwing the data into a 

21 computer program and pushing a button and seeing the 

22 answers that come out. 

23 Q. Any other opinions about using the Framingham 

24 health study as a comparison to the work of doctors 

25 Wyant, Miller and Zeger in this case? 
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1 A. No. 

2 One more point of clarification perhaps on the 

3 question before that last one, your question was 

4 worded in the comparing smokers and nonsmokers and my 

5 answer really is not only comparing smokers and 

6 nonsmokers but just generally when comparing an 

7 exposed group and unexposed group, and you are trying 

8 to make that comparison controlling for a collection 

9 of background variables. You have to look to see 

10 what the distribution of background variables is for 

11 the exposed and non-exposed groups so see whether 

12 such a comparison can be made and how sensitive the 

13 adjustment might be to modeling assumptions in 

14 general. 

15 Q. Now you had — in your answer said something 

16 about looking to the distribution of the background 

17 characteristics between the smoker group and 

18 nonsmoking group, first to see whether any comparison 

19 can be made, whether it's feasible, I think you said. 

20 A. Yes. 

21 Q. And is there some test in propensity score 

22 analysis that must be satisfied before any type of 

23 comparison of the two groups can be valid or 

24 feasible? 

25 A. Let me address that with a trivial example 
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1 first. Let's suppose you want to compare smokers and 

2 nonsmokers adjusting for gender, sex, and you found 

3 that all the smokers were males in the data set and 

4 all the nonsmokers are female, and they say, okay, 

5 now I will adjust for sex. Well that's — obviously 

6 you can't adjust for sex. The smokers are males and 

7 the nonsmokers are females. Let's suppose instead 

8 that they were almost all males, there was just one 

9 female who was a smoker and there is just one male 

10 who is a nonsmoker, and obviously it becomes more 

11 difficult to rely on those results because there is 

12 noise. Now that's a simple case where there is one 

13 variable like the one we had in this former exhibit. 

14 In complicated situations, there are many variables 

15 and the way in which they differ can be much more 

16 complex, and to understand the ways in which those 

17 distributions overlap or don't overlap, to understand 

18 that, you have to do some analysis to see to what 

19 extent you can — you can adjust for those kinds of 

20 differences. 

21 Another example would be if age is the 

22 background variable and all the smokers were between 

23 18 and 35 and all the nonsmokers were between 34 and 

24 65, to do a linear modeling kind of adjustment or 

25 log-linear modeling kind of adjustment means drawing 
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1 straight lines and reading the data pretending you 

2 know what's going to happen, so it's a complete 

3 extrapolation. You have to look at those kinds of 

4 background differences and understand to what extent 

5 those adjustments are feasible and to what extent 


6 

they 

rely on assumptions that 

you might be willing to 

7 

make 

, but you have to explore 

the sensitivity of the 

8 

answers to other assumptions. 

at least understand — 

9 

you 

have to understand what you are doing, not just 

10 

put 

it in a computer program 

and push a button. 

11 

Q. 

Maybe it will help if we 

actually look at your 

12 

supplemental report. 


13 

A. 

Sure. 


14 

Q. 

If I can find it in the 

pile here. It's Exhibit 

15 

3546 

• 


16 

A. 

Okay. Yes. 


17 

Q. 

And for instance on page 

3. 

18 

A. 

Yes. 


19 

Q. 

Near the bottom you have 

a paragraph labeled 

20 

number 1. 


21 

A. 

Right. 


22 

Q. 

And that's one of the conditions that you say 

23 

you 

must obtain — 


24 

A. 

Right. 


25 

Q. 

— or exist in order to 

use linear regression in 
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1 comparing two groups like smokers and nonsmokers; is 

2 that right? 

3 A. Correct. 

4 Q. And here you say the difference in the means of 

5 the propensity scores of these two groups being 

6 compared must be small; correct? 

7 A. Yes. 

8 Q. And then — 

9 A. Unless — 

10 Q. And then in parentheses — we will get there, I 

11 hope. 

12 A. Okay. 

13 Q. Then in parentheses you say, e.g. To me, that 

14 means for example. Is that what you meant by 

15 "e.g."? 

16 A. Yes. For example, yes. 

17 Q. The means must be less than half a standard 

18 deviation apart. 

19 A. Correct. 

20 Q. And then your situation as being benign and give 

21 three conditions, as I understand it, that must 

22 simultaneously exist for the situation to be, quote, 

23 benign, end quote? 

24 A. Right. 

25 Q. Is — is this standard deviation being less than 
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1 — well start that question again, professor. 

2 Does the difference in the means of the 

3 propensity scores being less than half a standard 

4 deviation apart, is that test half a standard 

5 deviation apart or is that an example of what you 

6 mean by "small"? 

7 A. Just an example of what I mean by "small" 

8 because it depends on other kinds of distribution 

9 characteristics, how skewed things are, other 

10 things. It certainly — Certainly I would never do a 

11 test, for example, are the means less than half 

12 standard deviation apart, and if the test says I 

13 believe they are, then you are fine. That's not good 

14 statistics. These are guidelines. They are not 

15 hard-and-fast rules. 

16 Q. And if the difference in the propensity means, 

17 the propensity score is greater than half, it still 

18 might be appropriate to use linear regressions in 

19 comparing groups like smokers and nonsmokers? 

20 A. Yes. If — it can — Yes, it can be, and these 

21 are — these other conditions help make it easier. 

22 The work that I've done that's reported, the early 

23 work reported in here I did jointly with Cochran, 

24 suggests that if you have these symmetric, normal 

25 distributions with equal variance in the two groups 
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and no one group is bigger in size than the other by 
very much, that — those linear extrapolations tend 
to be pretty reliable. 

Q. Let's assume we don't meet the three, all three 
of the conditions to make something which you call 
benign — 

A. Uh-huh. 

Q. — on page 3 of your supplemental report, so we 
don't meet those three tests simultaneously. 

A. Right. 

Q. Is there some upper bound on the difference in 
the means of the propensity scores between two groups 
being compared above which you say you just can't 
make the comparison, it's not feasible? 

MR. BIERSTEKER: May I help? 

MR. LOVE: Go ahead. 

MR. BIERSTEKER: I really mean, you mean 
feasible at all using any method or feasible using 
only regression methods? 

MR. LOVE: I mean feasible without using 
any method. 

Q. That's what I understood you to say before. 

A. That's right. 

Q. Did I properly understand what you said before? 
A. Yeah. See, that depends a lot on the setting. 
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1 Let's suppose, to take a real setting where there is 

2 an exposed group that's very small, some severe 

3 exposure, and they have background variables measured 

4 on them and the control group has all the background 

5 variables measured on them but it's huge by 

6 comparison. The example I'm thinking about in my 

7 mind is a study, for example, of 200 kids who are 

8 exposed to barbiturates in utero, the danger study 

9 involved in a group for many years, and there are 200 

10 exposed but there are 8,000 potential controls, and 

11 when you look at the distribution on propensity 

12 scores and things like that, you find out that the 

13 groups are widely separated. Women who got 

14 barbiturates when they were pregnant tend to be 

15 different from women who do not get barbiturates when 

16 they are pregnant. These are women in the '40s and 

17 '50s, I think, long time ago. But because the 8,000 

18 group is so large, there is a tail of that 

19 distribution who look like the exposed, the 

20 barbiturate, so among the 8,000 who did not get 

21 barbiturates, there is a subgroup of hundreds of 

22 them, very small proportion but that overlap the 

23 distribution of the exposed and look like them. 

24 There is a setting where you have, I believe, that 

25 maybe the standard deviation, the difference in — 
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1 between means and units of standard deviation was 

2 three quarters, one, maybe even larger. I don't 

3 remember exactly now. But because it's ratio of the 

4 control pool to the treatment pool, you are 

5 interested in the treated people, the exposed people, 

6 and there are a lot of people down there in the pool, 

7 so you can make the comparison. If the two groups 

8 had been the same size — another point also, the 

9 variance, I believe, in the control pool was greater 

10 so they had a long tail anyway. Not only were there 

11 many of them but they had a longer tail coming in 

12 with the variance, but instead they had the same 

13 variance and same sample size. We would find them 

14 very far about apart and would be very difficult to 

15 draw any conclusions from that study. That's an 

16 example where the technique used was matched 

17 sampling, where you matched each person who is 

18 exposed to somebody who is not exposed based partly 

19 on propensity scores, based partially on some other 

20 technologies to pair them up and set them aside. And 

21 then you throw away the irrelevant people from the 

22 control group because they just aren't like it, are 

23 not like the exposed people. 

24 Q. Is it fair to say that in that study that you 

25 are referring to, is that the Danish study? 
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1 A. Danish data set. 

2 Q. Danish data set. That the distributions of 

3 background characteristics in the two groups, the 

4 exposed and unexposed, were not nearly symmetric? 

5 A. I don't know if they were symmetric or not. 

6 Q. What I'm trying to find out. Professor Rubin, is 

7 whether that particular example would meet your test 

8 for benign, because you said the second part of the 

9 test, the distribution to the background 

10 characteristics in both groups have nearly the same 

11 variance, I think you said that was true. 

12 A. That would be — That's right. That would be — 

13 That would not be benign but — and regression 

14 adjustment, we didn't do regression adjustment in 

15 that study. That would be disaster. 

16 MR. BIERSTEKER: That's why I asked to 

17 clarify the point. You are using these points for 

18 feasibility generally when they are addressed to 

19 feasibility with regard to the progression. I'm 

20 sorry. 

21 Q. That is my question. I did mean to ask is there 

22 some point where the difference in the means of 

23 propensity scores expressed in standard deviations 

24 apart gets to be so wide that no statistical means is 

25 available to make a valid comparison between the two 
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1 groups in your opinion? 

2 A. I gave you an example where they were quite far 

3 apart yet something that could be done, not by 

4 regression analysis but using these other matching 

5 techniques, because there is a little subgroup of 

6 people who really did look like the exposed and we 

7 could pick off those and throw off the rest. If we 

8 did a regression analysis in that study, it would 

9 have been a disaster, and there are analogous studies 

10 in economics, for example, where that's been 

11 documented, where they had a parallel randomized 

12 experiment and used regression to do these type of 

13 adjustments, and the regression methods are 

14 completely unreliable with respect to getting the 

15 real answer but the propensity matching methods, 

16 matching in general, got much closer to the 

17 randomized experiment answer. 

18 (Interruption by the reporter.) 

19 Q. My question. Professor Rubin, was: Even using 

20 these other techniques, does there get to be a point 

21 where, I don't know, where the difference in the 

22 means is two standard deviations or four standard 

23 deviations? Is there some point where in your 

24 opinion there is a cutoff and you say really no 

25 statistical methods can be used to compare those two 
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1 groups in a valid way? 

2 A. Yes, if the groups don't overlap. If you have 

3 an exposed group here and the control group has no 

4 people who look like the exposed, you can't do it. 

5 You can't do it using any technique. 

6 Q. If there is no people who look like it. 

7 A. If there are no people, that's my example of all 

8 the smokers are male and all the nonsmokers are 

9 female. Ah! Here is an answer that adjusts for 

10 sex. You can't do it. 

11 Q. But I take it, then, there is no standard in 

12 terms of how many standard deviations apart the two 

13 means are that you can use as saying, there, once you 

14 get past that number, you can't use any statistical 

15 technique to compare these two groups in a reliable 

16 or valid way? 

17 A. If the question is about any statistical 

18 technique, and that's right, if you are drawing — 

19 trying to draw a conclusion about the exposed and 

20 what effect that would — what effect exposure had on 

21 them relative to being controlled, and if you have an 

22 enormous enough control group so there are control 

23 people that look like those treated group, then you 

24 can use those control people. So let's suppose they 

25 are two standard deviations apart, huge, but you have 
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1 10,000 times as many control people as you do treated 

2 people. It's possible that the tail of that 

3 distribution of control people will overlap in that 

4 region. 

5 Q. Okay. 

6 A. Then you can make comparisons, but due to a 

7 linear regression adjustment would take in that case 

8 these 200 exposed and these 8,000 control and throw 

9 them all in the computer program and push a button, 

10 and in such a case the linear regression adjustment, 

11 the slopes of those lines would be completely 

12 determined by the control people because there are 

13 8,000 of them relative to the 200 treated, and then 

14 you would be sending straight lines down into a 

15 region where it's irrelevant, so it — I'm trying to 

16 be helpful in answering your question. 

17 Q. All right. I think I understand — 

18 A. Okay. 

19 Q. — the main point you are trying to make. Let 

20 me ask you again about the Danish data set you were 

21 talking about. 

22 A. Yes. 

23 Q. I just wanted to make sure I did understand 

24 whether or not that data set met your test for being 

25 benign or not. 
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1 A. No, it did not. 

2 Q. What part didn't it meet? 

3 A. Well the difference in means, I believe, was — 

4 was greater than a half. 

5 Q. Well that's not part of the test for benign, is 

6 it, or am I misreading this? 

7 A. Unless the situation — That's right, the 

8 benign. Okay. So first of all, it doesn't even meet 

9 the first part. 

10 Q. Which is? 

11 A. Which is, must be less than half a standard 

12 deviation. It doesn't meet that part. But benign 

13 has three conditions, that's right. The description 

14 of background characteristics are nearly symmetric. 

15 Do I remember whether it's symmetric or not? I think 

16 not. I think it was not symmetric. Distribution of 

17 the background have the same variance? No, they did 

18 not. The sample size the same? No, one was 8,000, 

19 the other 200. So, there was no way in the world I 

20 was going to do linear regression in this setting, 

21 yet doing something else, which is this matching 

22 idea, could — could work because one group was so 

23 much bigger than the other that you could actually 

24 make some inferences about the — for the exposed 

25 people for that group of people — using that group 
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1 of controls who were down in the same region of this 

2 factor space, the space of background variables, as 

3 the exposed. 

4 Q. I think sometimes you referred to this larger 

5 group, sometimes referring to the tail or as a 

6 reservoir of people? 

7 A. Yeah, a control reservoir, that's right. 

8 Q. In situations where you don't have this large 

9 control reservoir so that your two groups are in at 

10 least in the same order of magnitude of the same 

11 size — 

12 A. Right. 

13 Q. — is there some level where the different — 

14 above which the difference in the means of the 

15 propensity scores gets to be where you can't make any 

16 valid or reliable statistical comparison between the 

17 two groups? 

18 A. What's a hard-and-fast rule on that? I don't 

19 think there is a really hard-and-fast rule but 

20 anything approaching a standard deviation is pretty 

21 shaky because you are relying — the groups are the 

22 — Let me stop. 

23 If you want to restrict inferences to the region 

24 where the two groups overlap, that's one possibility, 

25 so let's suppose they are far apart so, you know — 
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1 it's an age — let's do the age example; okay? The 

2 exposed are between 18 and 40 and the unexposed, the 

3 control, are between 30 and 60, and the standard 

4 deviation in a half difference in the means of the 

5 groups, that's substantial. I can't compare all of 

6 one group with all of another group. It's just too 

7 much extrapolation involved. But if I look at 

8 between ages 30 and 40, I find both smokers, exposed 

9 and unexposed, the control people there, so I'm going 

10 to say in this little example that let me talk about 

11 the effect of exposure but only on those people who 

12 are age 30 to 40. I have a shot at doing that 

13 because there is overlap with that subgroup of 

14 people. But if I want to say what's the effect of 

15 exposure for the people who are between 18 and 25, 

16 there are no control people anywhere near there. 

17 It's all fantasy built on some model that you just 

18 don't — can't have any faith in extrapolating that 

19 far. See what I'm saying? Let's suppose we have 

20 people 18 to 40 who were exposed to, would be 

21 military people exposed to, I don't know — 

22 Q. I think I understand. 

23 A. They overlap. You can't — It's like the sex 

24 thing again, they are all males who are smokers and 

25 all females that are nonsmokers, what can you say 
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1 after adjusting for sex? There is a whole collection 

2 of better techniques than linear regression which 

3 sometimes include linear regression as a component. 

4 There is nothing wrong with linear regression 

5 locally, making local adjustments, within cells where 

6 people are relatively compatible and where the 

7 techniques include better modeling techniques, they 

8 include subclassifying, they include discarding some 

9 people who grew out of the range of the other group 

10 and restricting inferences to a smaller group. I've 

11 been writing about this stuff for 30 years so there 

12 is a lot of stuff I've written about. 

13 Q. And to say. 

14 A. Lot of stuff to say about it, yeah. And a lot 

15 of paper — I don't know how many pages — on 

16 specifically doing that better. And frankly, it's 

17 frustrating to read something like this report that 

18 is on a very serious issue, a very serious problem 

19 that's completely unaware of more than a quarter 

20 century of literature that goes back to before me, 

21 goes back to Cochran and his writing in the 

22 literature, including in the context of some smoking 

23 examples. It's sort of frustrating. 

24 Q. Have you found other articles in the reported 

25 literature or other studies being done by people that 
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1 frustrates you as well in the sense people are going 

2 regression analyses on comparing two different groups 

3 without having something like a propensity score 

4 analysis in the first place? 

5 A. Sure. I hope I'm making inroads into those 

6 areas. I think I am in some fields, with other 

7 people as well, not just me. 

8 Q. I take it that continues today, in 1998, there 

9 are still studies coming out or articles being 

10 written that use regression analysis, linear 

11 regression analysis comparing two groups without 

12 having verified that the propensity scores are — 

13 meet your criteria and that the groups are close 

14 enough to be compared? 

15 A. Right. I want to be clear that's not just — I 

16 think propensity scores is a very efficient way of 

17 looking at the overlap in distribution. If someone 

18 did a careful job of looking for the overlap in 

19 distribution without using propensity scores, that's 

20 fine. 

21 Q. You are not saying that's the only way to check 

22 for the distributions? 

23 A. The critical issue is the overlap in 

24 distributions and this propensity score technology 

25 appears to be a very effective way of exploring that 
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1 and being a component of a successful adjustment for 

2 background variables, if it's possible. 

3 Q. And my question is whether people are writing 

4 articles, whether they are using propensity scores or 

5 some other means of checking the distributions of the 

6 two groups to compare. I understand your answer to 

7 be yes, you see them happening and yes, they are 

8 frustrating, to you anyway? 

9 A. Yes. I wish that better statistics, better 

10 science was being done than is being done. 

11 Q. Do I also understand that, you know, if the two 

12 groups that are being compared are fairly similar in 

13 size and you are not willing to restrict your 

14 analysis to a fairly small subset of one of those 

15 groups that happens to overlap the other, that if the 

16 difference in the means of the propensity scores is 

17 about a standard deviation or more, that in your 

18 opinion really no statistical means are available to 

19 compare those two groups? 

20 A. Not necessarily none but linear regression 

21 certainly cannot be relied on, and if you tried to do 

22 a more robust, in the statistical sense a technical 

23 term, robust, then the answer of that will not be 

24 completely wrong with minor deviations in the 

25 assumption, so robust to model deviations. What 
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1 would happen is, the standard errors, the honest 

2 confidence errors, will grow extremely large if you 

3 are in a situation like that. So that will be 

4 reflected, the real standard errors will reflect the 

5 fact that you can't say much when there is not much 

6 overlap whereas the standard errors from regression 

7 analysis are not honest that way. 

8 Q. So then I guess I should understand that even if 

9 you have two groups that are roughly the same size 

10 and there are difference in the means of propensity 

11 scores is even more than the standard deviation, 

12 there may be ways to validly and reliably compare 

13 those groups other than linear regression? 

14 A. Correct. There may be ways, depending on the 

15 actual sample size and how much overlap there is, 

16 it's possible to do something, but whatever it will 

17 be, whatever you are doing that's valid will yield 

18 much larger standard errors in general than would be 

19 the answer based on some assumption of straight 

20 parallel lines, which underlies almost all these 

21 kinds of models. 

22 Q. If we look at the NMES data set that you 

23 performed your propensity score analysis on, which is 

24 the one that includes the imputations from NMES that 

25 includes the imputations with Drs. Zeger, Wyant and 
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1 Miller, I take it it's your opinion that the 

2 difference in the means of the propensity scores for 

3 the smokers and nonsmokers was too great to use 

4 linear regression to compare those two groups. Is 

5 that right? 

6 A. That in combination with the — with the 

7 variance ratios, yes, that's my opinion, and also as 

8 stated by Cochran as his opinion from the tables that 

9 we did. 

10 Q. Is it your opinion that other statistical 

11 techniques could be used to compare the smokers and 

12 the nonsmokers in that NMES data set in a way that 

13 would yield what you consider to be reliable and 

14 valid results? 

15 A. It's possible that — that there still is enough 

16 overlap in the distributions despite this large 

17 initial differences in bias and variance ratios, that 

18 we could take a look — take a look at that to see 

19 what those other more robust methods would lead to 

20 for adjusting for background variables. It might be 

21 that's impossible. I don't know, I haven't done that 

22 next step in the analysis to actually try to do such 

23 an adjustment. 

24 Q. So sometimes it's possible and sometimes it's 

25 just not possible to make that comparison by any 
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1 statistical technique; is that correct? 


2 A. That's correct. If the — Again it's the male, 

3 all smokers are male, all the — 

4 Q. Not just in that extreme examples that you have 

5 given but in a real data set like the NMES data set 

6 where there is certainly some overlap between smokers 

7 and nonsmokers, certainly their ages overlap to a 

8 large extent there, their genders overlap to a large 

9 extent and so on? 

10 A. Right. You can have an overlap in each 

11 individual variable and have no overlap in the other 

12 direction. I can draw a picture if you would like to 

13 see. 

14 Q. No, I understand what you are saying. 

15 A. Okay. They can — It's a combination. 

16 Q. Right. 

17 A. It's the — 

18 Q. If you have 10 different categories, there may 

19 be no smokers who meet, you know, the same — 

20 A. Right. 

21 Q. — set of all 10 of those criteria and there are 

22 smokers that do meet all 10 of those criteria, so if 

23 you want to use all 10, you can't compare any smokers 

24 to nonsmokers because there aren't any that meet all 

25 10 criteria? 
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1 A. Correct, that's correct. That description, 

2 though, would imply I really care about the 10-way 

3 interactions. I don't care about that. I'm not 

4 going to hold this analysis to that standard. That's 

5 too hard. But I would want to hold it to a standard 

6 at least for the main effects and two-way 

7 interactions. So if you have 10 variables, I would 

8 like in each two-way table to have overlap. I will 

9 not require overlap in all, in each cell, that you 

10 have both smokers and nonsmokers, because that would 

11 be a standard that would be too hard for any 

12 observational study to meet and would be just silly 

13 requiring that here. But I am going to hold it to a 

14 standard that says it should be — there should be 

15 some overlap at least in the main effects and some 

16 two-way interactions that people regard as 

17 important. 

18 Q. Is it important to look at all the two-way 

19 interactions among the variables or just certain 

20 ones? 

21 A. For things like that, I would rely on someone 

22 else's opinion about whether these things interact, 

23 whether age interacts with sex, old males are 

24 different from old females. I suspect that's true 

25 but I'm not a doc so I want someone else to tell me 
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1 whether that's important. In some of these analyses 

2 I have seen that put an age and age squared, because 

3 there tends to be not a linear effect on healthcare 

4 costs, even on a log scale, nothing much happens 

5 between some age and 50, 60, and then things start 

6 taking off, so that's a non-linear effect. Is that 

7 non-linear effect different for men and women? Yes, 

8 probably. Aging is different for men and women so 

9 you want an age by sex interaction. How about for 

10 pre- and post-menopausal women? Yeah. Is there 

11 difference in age and eating habits? I don't know. 

12 Age and exercise? I don't know. I'm not on top of 

13 that literature. The selection of which of those 

14 variables should be in there, the selection of which 

15 variables should be in there should be made by 

16 someone other than me, maybe with some advice from me 

17 what the consequence of adjusting for that are or how 

18 the two groups differ in those background 

19 characteristics, but the particular selection of 

20 which variables to control for and which functions, 

21 like the squares, should be made by someone else. 

22 Q. Well in the Table 4 of your supplemental 

23 report — 

24 A. Uh-huh. 

25 Q. — called the "Interaction Model" — 
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1 A. Right. 

2 Q. — with the results, does that look at all the 

3 two-way interactions among the 25 background 

4 variables? 

5 A. Yes, it does. That's right. And that was just 

6 to illustrate. I did three versions of it: One was 

7 just main effects, one was this interaction model 

8 which put a huge number of variables — I guess 304 

9 is what the title says — and then this Table 3, 

10 which is something in between. 

11 Q. Is it your testimony that these interactions, 

12 you would not want to necessarily examine all of them 

13 if you were actually trying to figure out whether 

14 that test is met by the NMES data set. You want to 

15 talk to some professionals in healthcare and so on to 

16 find out which are the important interactions? 

17 A. Correct. And I would also end up probably 

18 including some variables in here. 

19 Q. In here? 

20 A. In the correct model, in the correct model. 

21 They are not even included in the interaction model. 

22 The interaction model only has two-way interactions. 

23 In your example of 10 variables, you had 10-way 

24 interactions in there. Now 10 way seemed too high to 

25 me but it certainly seems plausible to me there is an 
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1 age, by sex, by exercise interaction. That's a 

2 three-way, and that's not in here. Perhaps it should 

3 be. If the amount of exercise has a different effect 

4 for old men than old women, than young men and young 

5 women, and that would be a three-way interaction. 

6 That's not in here. And if a health professional 

7 said that's important, there is an important 

8 difference of the — it's important — how much you 

9 exercise on your healthcare costs makes a difference 

10 in each of these four cells of old, young men and 

11 women, there should be three-way interaction in here 

12 and that's not in here, and that could create an even 

13 bigger difference. That's why — 

14 These analyses are meant to really show that the 

15 main effects is to show just what they did, just what 

16 the Zeger team did, and that's the minimum amount of 

17 looking at the overlapping distribution. Table 2 and 

18 Table — I'm sorry. Table 3 and 4 were just 

19 indications of other things that you could put in but 

20 in fact in the step-wise for the two-way interactions 

21 and all the two-way interactions, but in fact the 

22 right thing to do would be to talk to healthcare 

23 officials to find out which two ways are important 

24 and the other, which three ways should be in there as 

25 well. 
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1 Q. So the one model you would definitely keep after 

2 talking to a healthcare professional, so on and so 

3 forth, I take it would be the main effects model? 

4 A. I wouldn't say one I'd keep but that just shows 

5 how — that shows how bad the adjustments that the 

6 Zeger team did are, even if you only were worried 

7 about the variables that are in the Zeger team model 

8 in their sample form. 

9 Q. My question, if you were looking at this NMES 

10 data set, asking yourself can you use linear 

11 regression to compare them or not, am I understanding 

12 correctly that you don't need to talk to healthcare 

13 officials or anyone else — 

14 A. Correct. 

15 Q. — to do the main effects model? 

16 A. Correct. They select — That's their model. 

17 That's exactly what they did and this says even if 

18 all the data in NMES imputed are correct, that these 

19 are the only variables you want to adjust for, then 

20 you cannot trust what they did. 

21 Q. And the Tables 3 and 4, the interaction models, 

22 the regular and step-wise models, are things that you 

23 would want to consult with a healthcare professional 

24 or some other professionals first before actually 

25 performing that type of analysis to determine whether 
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1 the NMES data set will allow these kinds of 

2 comparisons be made on a linear regression basis? 

3 MR. BIERSTEKER: Excuse me just a minute. 

4 I object to the form of the question. 

5 A. If these two tables. Table 3 and 4, are relevant 

6 to the question as follows. Here are the background 

7 variables we want to adjust for, but let's suppose 

8 instead of just their main effects we want to worry 

9 about other possible functions in them. Two simple 

10 kinds of functions are all interactions and another 

11 would be a step-wise, stepping through and taking the 

12 ones that appear to be important, here's the kinds of 

13 results you would get. Now in fact a healthcare 

14 expert might say, well, really we should be including 

15 more variables than this that reflect certain 

16 three-way interactions. 

17 Q. But we don't know what — 

18 You don't know, anyway. Professor Rubin, what a 

19 healthcare professional is going to say because you 

20 haven't asked one yet; true? 

21 A. Correct. But apparently they think the 

22 variables they put in their main effects model are 

23 relevant because that's what they did. 

24 Q. But we don't know whether healthcare 

25 professionals would think that all the two-way 
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1 interactions that flowed from those 25 variables are 

2 important; right? 

3 A. We don't know that but we don't know whether 

4 they understand that the model they fit only adjusts 

5 for these main effects and does not do what you 

6 described, which is adjusting for each cell. It's 

7 not doing that. If they want to do that, they have 

8 to put in 25-way interactions because it has to have 

9 a parameter for each one of those cells. You know 

10 what I'm saying. They product all these variables 

11 out. But the way the models are described often in a 

12 linear regression is deceptive. I put in age in the 

13 model, yeah. I put in sex, yeah. I put in exercise, 

14 yeah. And I've adjusted for age, sex, exercise just 

15 as if I formed those three-way — three-way table. 

16 That's the way it's often described. I'm saying 

17 that's wrong. It doesn't do that. All it does is 

18 adjust for these marginal effects on those 

19 variables. It doesn't do what you were describing, 

20 which was put them in the same cell. So that's what 

21 these other tables are doing, is trying to look — 

22 let's suppose you want them in the same cell but just 

23 the two-way faces, not really in the same cell, 

24 because I think that's what people think a linear 

25 model doing, doing more than what it does. 
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1 Q. But if we assume you described to healthcare 

2 officials what your models really do, they may tell 

3 you that they want to examine some of these two-way 

4 interactions and not others. We don't know what they 

5 are going to tell you; right? 

6 A. Correct. 

7 Q. And so the propensity score analysis that we 

8 would do in the interaction and step-wise interaction 

9 models could be significantly different than what's 

10 shown here in Tables 3 or 4 of your supplemental 

11 report? 

12 A. Significantly different, I doubt that. Could 

13 show bigger differences, that's true. 

14 Q. Could show smaller differences; true? 

15 A. We know it's bounded from below by the main 

16 effects model, so it's at least that different. It 

17 could only get more different as you add more 

18 variables. 

19 Q. Is it the difference in the means that's bounded 

20 from below or is it the difference in the ratio of 

21 variances, or what's bounded? 

22 A. It's really the — the means in the sense — 

23 See, how to say this? If I have a group and they 

24 differ on some set of variables, like the main 

25 effects model, and now I want to bring in another 
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1 variable, and let's suppose that variable, after 

2 adjusting for these ones that are already in there, 

3 has exactly the same distribution for smokers or 

4 nonsmokers, then bringing that variable in will not 

5 move these groups farther apart. They have the same 

6 distribution, both groups. But let's suppose it has 

7 a difference, even after adjusting for these other 

8 variables. All it can do is make it bigger. All it 

9 can do is make them farther apart if analyzed the 

10 right way. 

11 Q. If we compare — The letter capital B that you 

12 use in these Tables 2, 3 and 4 stands for the 

13 difference in the means of the propensity scores of 

14 the two groups set forth in how many standard 

15 deviations away they are; correct? 

16 A. Correct. 

17 Q. If we look at the main effects model. Table 2, 

18 and we compare the figures under the B column there 

19 with those under Table 4, the interaction model — 

20 A. Right. 

21 Q. — do we in fact see that for at least the first 

22 — really for every group, all six age/gender 

23 groups, that B is smaller in the interaction model 

24 than in the main effects model? 

25 A. What's — what's going on here is, there is some 
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1 confounding with the ratio variances that's taking 

2 place so that — let's look, for example, at the 

3 first — the first row for females. See, B is the 

4 number of standard deviations and as you bring in 

5 more variables, the ratio, the standard deviation can 

6 change within a group, and so B is — remember in 

7 your example how we calibrated by dividing by the 

8 number of standard deviations. 

9 Q. Right. 

10 A. As you bring in more variables, the 

11 standardization itself of this propensity score can 

12 change the ratio of standardization can change. The 

13 thing you are normalizing downstairs can change. For 

14 example, in the first row you see that's right, B 

15 goes down, but see how the ratio of standardization 

16 is completely different, goes from 71 to 21? 

17 Q. Seventy-one is a better ratio for your purposes 

18 in terms of having what you call — 

19 A. Benign. 

20 Q. — a propensity score that is consistent with 

21 linear regression; correct? 

22 A. Correct, correct. 

23 Q. Let's look at the second row, though, the 

24 females 35 to 64. 

25 A. Right. 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



679 


1 Q. Isn't it true that — that not only do their B 

2 go down, which makes them more consistent with linear 

3 regression, but also that their ratio of the 

4 variances gets closer to Y, which also makes it more 

5 amenable to the linear regression? 

6 A. That's right. And then I think if you look to 

7 see what's going on in the direction that's called, 

8 technically, orthogonal to the propensity score, 

9 which has to do with the right-hand part of the 

10 table, things are looking pretty severely different 

11 there in terms of the — see, there is no mean bias 

12 in these other directions. I've taken care of that 

13 by taking out the — by looking at residuals off the 

14 propensity score, and what you see going on there, 

15 there are a lot of directions in a geometric sense 

16 directions, combinations of variables where the 

17 variance ratios are severely different. So I think I 

18 was fairly careful in my answers. I was talking 

19 about things would get worse if you add more 

20 variables. I cannot say the B will get bigger or R 

21 will get bigger. I hope I was careful. I said the 

22 distribution would become farther apart and the way 

23 they get farther apart can turn around different 

24 ways, so sometimes will be reflected in the bias. 

25 Often in very simple cases that would generally be 
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1 true where the groups are symmetric and have the same 

2 variates, but it can be reflected in other ways in 

3 these directions or orthogonal to the propensity 

4 score. 

5 Q. And that's what you call the variance of the 

6 residuals remaining after adjusting for propensity 

7 scores? 

8 A. Yes, that's correct. Because these things, 

9 these residuals, after adjusting for propensity 

10 scores, are themselves particular combinations of the 

11 variables that were in the model and if it turns out 

12 that the outcome, the expenditures, is a function of 

13 a variable in that direction, then although the 

14 variable has the same means, the variances are 

15 entirely different and you get a position where 

16 linear extrapolation can be — linear modeling, 

17 linear extrapolation is very non-robust, even if the 

18 means are the same, because they can be sensitive to 

19 a little bit of curvature in their response surface 

20 that rises at one end, does not rise at the other 

21 end. 

22 Q. While looking at Table 2, the main effects model 

23 results, in that last grouping on the right called R 

24 for covariates after adjustment, that's what we have 

25 been talking about as the orthogonal? 
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1 A. 


Right, orthogonal. That means after I've taken 


2 out the propensity score, in a regression sense, in 

3 some sense it's regression sense, just looking at the 

4 distribution of these variables after having pulled 

5 out the propensity score, so all of these residuals 

6 which is described in the table have the same mean 

7 effectively in the smoking and nonsmoking groups but 

8 their distribution is still different and how they 

9 differ is — I'm just summarizing — is differing by 

10 their variance ratio. 

11 Q. If we look at that table, there is five columns 

12 to that part of the table? 

13 A. Correct. 

14 Q. And I take it the middle column between a ratio 

15 of four-fifths and ratio of five-fourths is what you 

16 would consider to be — 

17 A. Pretty benign. 

18 Q. — good? 

19 A. Yeah, that's right. 

20 Q. Where did you — How did you determine that 

21 four-fifths should be the lower bound of that range 

22 and five-fourths should be the upper bound of that 

23 range? 

24 A. Not based on anything that has been published so 

25 far. The stuff that's been published concerns the 
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1 less than half, greater than two. The four-fifths 

2 and five-fourths is based on a mental combination of 

3 various things I've seen over the years. It's not — 

4 I couldn't point you to any particular place where 

5 the four-fifths and five-fourths came out. 

6 Q. But in the published literature there is 

7 information about this R for the covariates after 

8 adjustment being problematic if it's lower than a 

9 half or greater than two; is that correct? 

10 A. Right. Because — But you won't find the 

11 literature after adjusting for covariates. What that 

12 shows, if you have a background variable, which each 

13 of these covariates after adjustment is, is just a 

14 combination of background variable, if you have a 

15 background variable, even for which there is very 

16 little bias in the means, almost no bias in the mean 

17 but the variance ratio is bad, like greater than half 

18 or less than half, are you in trouble using linear 

19 regression? The answer is yes. 

20 Q. Greater than half are — 

21 A. Greater than half or greater than two, less than 

22 half — it's just symmetric. It's a ratio, the same 

23 thing, one goes one way, the other goes the other 

24 way. 

25 Q. The variance ratios are problematic when they 
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1 are less than half or greater than two? 

2 A. Right. I apologize if I didn't say that. 

3 Q. If we look at Table 2, the table for R for 

4 covariates after adjustment, if you look at the ones 

5 — there are none greater than two, is that correct, 

6 on this table? 

7 A. Correct. 

8 Q. If you look at the ones that are less than or 

9 equal to a half — 

10 A. Yes. 

11 Q. — just take the first row, females 19 to 34, 

12 there was one such covariate; right? 

13 A. Right. 

14 Q. Do you know how much less than a half its ratio 

15 was? 

16 A. No, I don't. I mean I — I think at one time I 

17 was thinking about doing this part of the table in a 

18 way by actually showing percentiles of those Rs, 

19 those ratios, that would give specific numbers, and I 

20 think I concluded, and I don't remember why, that 

21 displaying it this way was much more transparent, 

22 instead of having all these point seven three twos 

23 and point oh two ones. I don't remember exactly 

24 why. 

25 Q. But anyway, you don't know how much less than a 
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1 half that one particular covariate was? 

2 A. No. 

3 Q. Assuming it was a ratio? 

4 A. Yes. 

5 Q. Do you know what covariate that was, which 

6 variable it was? 

7 A. No, but I could find out. But remember, after 

8 the adjustment, it's now a combination of lots of 

9 covariates, so it could be something like age 

10 adjusting for the propensity scores, could have all 

11 kinds of other things built into it. 

12 Q. Is there anything that you provided to the 

13 plaintiffs in this case that would allow them to 

14 determine what that variable or relationships of 

15 variables is that represents the one that had an R 

16 for covariates after adjustment after less than a 

17 half? 

18 A. If Raghunathan provided the disk that did the 

19 analysis, then yeah, you can look at that and figure 

20 it out. 

21 Q. Is that a disk that was provided with the 

22 supplemental report or disk that was provided back 

23 when we received exhibit — 

24 A. This would be supplemental report. I didn't do 

25 any of these covariates after adjustment until I did 
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1 this report, so that would be with the supplemental 

2 report. 

3 I did consider other ways of doing this 

4 right-hand part of the table as well using things 

5 that I thought would be just be more confusing but in 

6 some ways would be — not only were sophisticated but 

7 a little bit sharper, having to do with something you 

8 don't want — I don't think you want to hear about 

9 it. I'll talk about it if you want to. I think this 

10 is — I regard this as a relatively transparent way 

11 to describe what's going on with these other 

12 variables, "other" meaning other than the propensity 

13 score. 

14 Q. And if you have, you know, one covariate after 

15 adjustment that falls outside of what you consider to 

16 be a reasonable range, does it make sense to examine 

17 what that covariate is and ask healthcare 

18 professionals, or whatever if you are looking at 

19 smokers or nonsmokers, whether that factor really is 

20 something that was considered to be a significant 

21 factor in looking at the relationship between smoking 

22 and healthcare costs? 

23 A. Yes, I think that makes sense. 

24 Q. So you would want to know what they were, the 

25 covariates that fell outside of the range and examine 
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It's also true, though. 


1 those? 

2 A. That's not a bad idea. 

3 that the selection of covariates was made by the 

4 Zeger team, presumably after some thought, and they 

5 wanted to adjust for these, these variables, and so 

6 if after doing the analysis you say, woops, that one 

7 looks like trouble, I now want to pretend like I 

8 never wanted to do that, that's not quite fair. 

9 Q. Well this is doing it in an intellectually 

10 honest way. You may have included variates that you 

11 thought were marginal initially and one turns out to 

12 have — falls outside the range on your test for R 

13 for covariates after adjustment. Then it may be that 

14 that variable really is not thought to be 

15 particularly well associated with smoking and not 

16 really part of any causal pathway, or associative 

17 pathway for that matter, but was included anyway for 

18 whatever reason, because somebody else included it. 

19 A. Yes. It's hard to do that intellectually in an 

20 honest way but with that adjective, sure, that's okay 

21 to do. But everybody knows if you have an analysis 

22 you want to get through to the end and if there is 

23 something standing in the way can I find a reason, 

24 can I rationalize why that shouldn't have been there 

25 after having decided it should be there, now it's 
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I can write 


1 creating problems, can I find a reason. 

2 a paragraph why I should take it out. 

3 Q. Would it also be something that would be 

4 appropriate to, if you do take it out, make it 

5 explicit that's what you did and explain what the 

6 analysis that led to that decision was? 

7 A. Absolutely. The analysis, not only statistical 

8 analysis, but intellectual analysis that led you to 

9 take that out afterwards. It's better to do that 

10 a priori, like exclude 10-way interactions, exclude 

11 8-way interactions, exclude two-way interactions that 

12 you don't think have any medical consequence so that 

13 you don't have to make any kind of intellectual 

14 argument later and say I really am that honest. It's 

15 better to do that in advance than after. 

16 But you're correct, that one could look at that 

17 and say that's a problem, can I now make an argument 

18 for why I do not have to adjust for that even though 

19 I said I should at the beginning. 

20 Q. All right. We can turn back to the first page 

21 of your supplemental report. Under the heading Roman 

22 numeral I, Propensity Score Methods, near the bottom 

23 of the first paragraph you are saying without such an 

24 analysis of these distributions of smokers and 

25 nonsmokers, one cannot have any confidence in the 
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1 Zeger report's regressions' ability to adjust 

2 reliably for background differences between smokers 

3 and nonsmokers; is that correct? 

4 A. Correct. 

5 Q. Are you saying that it's impossible for the 

6 Zeger report regressions to have produced valid 

7 results or proper results? 

8 A. Am I saying it's impossible that the numbers 

9 that they get are the numbers that you would get 

10 having done something — something that is correct? 

11 Is it impossible those two numbers are the same? 

12 Q. Yes. 

13 A. No, I'm not saying it's impossible those two 

14 numbers are the same. I'm saying the process is not 

15 correct. My analogy is that you are in a restaurant 

16 and there is somebody completely drunk and he wants 

17 to drive home, see. Is it impossible that he will 

18 get home safely without having an accident? Here's 

19 someone else who is sober and he is going to drive 

20 home, you say is it impossible that they will both 

21 get home at the same time and without having an 

22 accident? No, it's not impossible, but certainly I 

23 would have more faith in the sober person reliably 

24 driving home than the person who can't even stand 

25 getting behind the wheel and flailing his way home. 
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1 It's not impossible he will get there but I don't 

2 have any confidence. 

3 Q. That's what I'm trying to understand. In your 

4 opinion you are saying the methods they used aren't 

5 correct in your opinion but I take I you can't give 

6 an opinion as to whether the results they obtained 

7 are correct results or incorrect results? 

8 A. I can give you an opinion that the same analogy 

9 with the drunkard, that I think it would be very 

10 unlikely the drunkard would get home safely. I think 

11 it's quite unlikely looking at these results that the 

12 answers are correct and it's the same analogy. 

13 Q. By "these results" you mean propensity — 

14 A. Yeah, the overlap distribution of these 

15 background variables. 

16 Q. Then the next paragraph, bottom of page 1, you 

17 say, "The statistical literature warns that 

18 regression analysis cannot reliably adjust for 

19 differences in background characteristics when there 

20 are substantial differences in the distribution of 

21 the background variables in the two groups"; correct? 

22 A. Correct. 

23 Q. Now can you identify any statistical literature 

24 that sets forth that warning other than literature 

25 that you've authored or was authored by your thesis 
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1 advisor, I guess you call him. Professor Cochran, or 

2 one of your graduate students? 

3 A. That warns about regression adjustment? Yeah, 

4 there are other literature. Can I recall them right 

5 now? There is some literature in economics that 

6 worry about model specification, the whole collection 

7 — Let me be clear. There is a specific problem of 

8 using linear regression to adjust for differences 

9 like this in an observational study. There is a 

10 broader question of relying on linear modeling to 

11 draw conclusions, and for that broader question there 

12 is enormous literature of worrying about the 

13 non-robustness of linear models just in general in 

14 many fields. It's enormous in statistics. When 

15 focusing on the more specific question of using those 

16 kinds of linear models to make adjustments in 

17 observational studies, the literature is smaller, and 

18 then you have to look then in subfields where people 

19 are doing this kind of analysis. What I mean by 

20 "this kind of analysis," taking an observational 

21 study, not random, observational studies, trying to 

22 compare exposed, unexposed, adjusting for a 

23 collection of background variables. 

24 Q. That was the — 

25 A. Context? 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



691 


1 Q. — context in which my question was meant, 

2 Professor Rubin, in using statistical — using 

3 regression analysis, linear regression, to adjust the 

4 background variables. 

5 A. There is this book that I referred to I believe 

6 in here that was — I think Hauck was one of the 

7 editors, von Dam. Where did I have this? 

8 Q. The bottom of page 4 of your report? 

9 A. Right, Anderson, Auquier, Hauck, Oakes, Vandaele 

10 & Weisberg, Statistical Methods for Comparative 

11 Studies, that has that kind of information in it. 

12 Q. What kind of information does it have? 

13 A. Information that linear modeling adjusting can 

14 be unreliable even on log-linear — linear-log or 

15 logit model as well as linear assumption. I referred 

16 to that. There are articles in — recent articles in 

17 economics literature where that arises as well. 

18 There is a growing awareness of the problem and, 

19 as you probably realize, the advent of computers made 

20 it very easy to dump data in, push a button and 

21 getting something out. It's a very fast way of 

22 getting publications. And they are — the dangers — 

23 although the dangers of doing analyses like that have 

24 been known for 30 years, there is still a temptation 

25 to do an analysis that does it because the answers 
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1 come out right away. 

2 Q. Does the Anderson reference at the bottom of 

3 page 4 of your report describe propensity scores as 

4 you discussed them in your supplemental report? 

5 A. Not exactly as I described it but warns about 

6 the lack of overlap and distribution, what it can 

7 do. A propensity score paper wasn't published until 

8 1983, which is after this book was published, so if 

9 they were — if they talk about those sorts of 

10 things, which is sometimes called discriminate 

11 matching, or above that it's — they talk about 

12 overlap and distribution issues, but they certainly 

13 do not use propensity score jargon because it didn't 

14 exist then. 

15 Q. Can you identify for me today, from your memory, 

16 obviously, any other article or book besides the 

17 Anderson reference at the bottom of page 4 where 

18 someone other than you or Professor Cochran or one of 

19 your graduate students describes this problem of 

20 using any type of linear regression when the 

21 background — when the — without comparing the 

22 distribution of the background variables first? 

23 A. I believe there is some in economics, there is 

24 certainly one that was a thesis that came out a 

25 couple, three years ago in the department of 
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1 economics at Harvard. I think the article is in its 

2 final stages of being accepted for publication in the 

3 Journal of the American Statistical Association by a 

4 couple former graduate students at Harvard. 

5 Q. Now were they in the department of statistics in 

6 Harvard or some other — 

7 A. Economics. 

8 Q. Economics. So they weren't students of yours, 

9 then. 

10 A. No. I talked to them but that's — they weren't 

11 mine. 

12 I'm sure there must be a variety of papers that 

13 aren't coming to mind but I can certainly talk to 

14 epidemiologists who I would regard as being aware of 

15 this issue of modeling assumptions and sensitivity to 

16 modeling assumptions, because it motivates all sorts 

17 of techniques that try to get away from explicit 

18 modeling assumptions. Some of the stuff I like, some 

19 of the stuff I don't like, but they try to be more 

20 robust. Sometimes it goes under general estimating 

21 equations approach in doing these kinds of 

22 adjustments, trying to get away from reliance on 

23 specific functional and form specifications. In 

24 economics, it's often known as functional form 

25 specifications and how you have to worry about that. 
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1 I could try to do a much better job of accumulating 

2 some references on that. 

3 You say and my students, I was thinking Paul 

4 Rosenbaum but I guess he doesn't count despite that 

5 he has been a full professor at Wharton for 10 years, 

6 he doesn't count. 

IQ. He certainly was a student of yours. We don't 

8 let people forget that, do we? 

9 A. Evidently not. He has to be ashamed of it, 

10 that's even worse. I'll have to call him tonight and 

11 tell him. 

12 Q. So as far as your memory sitting here today, 

13 there is nothing other than — than the Anderson 

14 reference at the bottom of page — 

15 A. There is a number of vague things; I can't 

16 remember the specific references. 

17 Q. I understand. But you can't identify an article 

18 or book? 

19 A. Right. 

20 Q. At the moment. 

21 A. It's like asking did you know driving drunk is 

22 bad for you? Yes. Can you give me a reference for 

23 that? How many references are there? I don't know. 

24 There are lots of, there must be many of them that 

25 make this point because it is a really rather obvious 
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1 point, and I'm not being very good at recovering 

2 them. 

3 Q. This point isn't as obvious to most people as 

4 driving drunk is dangerous. 

5 A. Correct. 

6 Q. Turn to page 2 of your report. Professor Rubin, 

7 and at the bottom you have a quote from an article or 

8 book that Professor Cochran wrote in 1965; is that 

9 correct? 

10 A. Yes, I think that's an article; correct. 

11 Q. Now in there he is talking about whether you can 

12 trust regression analysis. In fact, regression 

13 adjustment is in square brackets in that quote you 

14 put in? 

15 A. Correct. 

16 Q. What did you mean by putting that in brackets 

17 there? 

18 A. He doesn't say that. He says none of the 

19 methods, but the article is entitled "Analysis of 

20 Covariance: Its Nature and Uses." Analysis of 

21 covariance is another name, older name for the kinds 

22 of regression adjustments that are done here. He 

23 refers to the fact that there is this background 

24 variable X and there is an outcome variable Y, and 

25 the way in the old days, pre-computer days, you did 
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1 this adjustment was to do what's called an analysis 

2 of covariance, which is basically an analysis of X 

3 times Y, of the covariance between X and Y. It's an 

4 old name, I think it's due Fisher. So in an 

5 experiment or study where there is just one X and one 

6 Y, the old name that goes back to that, to probably 

7 the '30s or '40s, is called analysis of covariance, 

8 the use of regression adjustment in an experiment to 

9 do exactly the kind of adjustment we talked about 

10 here is called analysis of covariance. 

11 Q. So it's your understanding that, although 

12 Professor Cochran's book didn't require regression 

13 adjustment not being trusted to remove all the bias, 

14 that's what he meant based on the title and the 

15 general nature of the article? 

16 A. Correct, the title of the article, and my memory 

17 is that in fact this may have appeared in the special 

18 issue of biometrics devoted to the analysis of 

19 covariance. I'm not sure about that but I think so. 

20 Q. Now you never know for sure as a statistician, 

21 do you, that any statistical method has removed all 

22 the bias? 

23 A. That's correct. 

24 Q. In fact, you don't — you never know for sure as 

25 a statistician that any statistical method has 
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1 removed any of the bias; correct? 

2 A. That I don't understand. 

3 Q. I mean, how do you know for sure that — that 

4 the statistical method has removed bias from the 

5 problem if you don't really know what the truth is? 

6 A. Okay. Let's go back to your example. In 

7 exhibit — on the floor over there. 

8 Q. Okay. Exhibit 3551. 

9 A. Good. Now here there is initial bias between 

10 smokers and nonsmokers due to gender; correct? 

11 That's the way we drew it. There is initial bias to 

12 the gender and in fact smokers are more heavily male 

13 and nonsmokers are heavily female. 

14 Q. If that's what you mean by bias, yes. 

15 A. That's the only thing I understand. If you have 

16 another meaning, you can tell me. 

17 Q. I just want to use your meaning. That's fine. 

18 A. Okay. There is an initial bias due to gender, 

19 so we compare them within males and within females. 

20 Have we removed the bias due to gender? Yes, we have 

21 removed the bias due to gender by comparing within 

22 males and females. You can no longer claim by 

23 comparison between smokers and nonsmokers within 

24 males is due to the bias in the gender distribution. 

25 I'm only looking at males. So you have removed bias 
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1 there and you are confident you removed bias due to 

2 gender. 

3 THE WITNESS: Could we take a break? 

4 (Recess taken from 4:32 to 4:45 p.m.) 

5 BY MR. LOVE: 

6 Q. Professor Rubin, looking at page 3 of your 

7 supplemental report. 

8 A. Okay. 

9 Q. In the first full paragraph, starts out with the 

10 word "In particular." 

11 A. Yes. 

12 Q. You say there are three basic distribution 

13 conditions that must be met simultaneously for 

14 regression adjustment to be trustworthy; is that 

15 right? 

16 A. Right. 

17 Q. And then you say if any of those conditions is 

18 not satisfied, then regression adjustment can't be 

19 used; is that correct? 

20 A. Well I said cannot be — cannot be used to get 

21 reliable and trustworthy answers. I'm not trying to 

22 be precise. It can be used, meaning somebody did 

23 it. They pushed a button and did it so you can do 

24 it, but can you believe what comes out. 

25 Q. You say in your opinion you can't believe what 
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1 comes out if it doesn't satisfy all three of those 

2 conditions? 

3 A. Basically, right. 

4 Q. And therefore when we looked at, for instance, 

5 Table 2, the main effects model back around page 11 

6 of your report, I guess — 

7 A. Right. 

8 Q. — no matter what figures we may have seen on 

9 the R column or for the five R for covariates after 

10 adjustments columns, if the B column shows figures 

11 greater than a half a standard deviation, it's your 

12 opinion that linear regression should not be used; is 

13 that right? 

14 MR. BIERSTEKER: Object to the form. 

15 A. Should not be used in the way it's used in the 

16 simple blind adjustments without looking. There are 

17 ways to use linear regression in combination with 

18 other things but I'm referring to — what I'm 

19 referring to here is the way the linear regression 

20 models are done in this report. 

21 Q. Simply using a probit linear regression model or 

22 logit linear regression model without doing some 

23 comparison of subgroups or whatever? 

24 A. Or linear log for the expenditure amounts, that 

25 sort of modeling exercise. 
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1 Q. All those things would be improper in your 

2 opinion? 

3 A. Improper to do it that way and stop there; 

4 correct. Let me — To clarify just a little bit, if 

5 you are fooling around for a little data set for 

6 classroom exercise to illustrate a point, you could 

7 do something like that, wouldn't bother me. But if 

8 you are trying to get an answer that you believe is 

9 important, then you should exert effort to try to see 

10 that it is — that you are doing something that's 

11 reliable. 

12 Q. And the first condition that needs to be 

13 satisfied that we talked about earlier was the 

14 difference in the means of the propensity scores 

15 being small? 

16 A. Correct. 

17 Q. When expressed in terms of how many standard 

18 deviations away they are. 

19 A. Correct. Let me just hear the question again, 

20 make sure — 

21 Q. The first — 

22 A. It's getting towards the end of the day and I'm 

23 afraid I'm not attending with the same rigor I was 

24 earlier. 

25 Q. The first distributional condition that you say 
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1 must be simultaneously met to use linear regression 

2 of any type in the way that Professor Zeger, Wyant 

3 and Miller did — 

4 A. And trust the results. 

5 Q. — is that the difference in the means of the 

6 propensity scores between the two groups being 

7 compared must be small. 

8 A. Correct. 

9 Q. The second — 

10 A. Unless — 

11 Q. — condition. 

12 A. Yes, okay. 

13 Q. Subject to this benign situation. 

14 A. Right. 

15 Q. The second — First of all, have you examined 

16 the NMES data to determine whether or not it fits the 

17 benign conditions? 

18 A. Yes. We can see going back to that main effects 

19 model that it's bigger than a half but the variance 

20 ratios are in that column, are — many of them are 

21 right around a half, which is very small. One is a 

22 fifth and one is three quarters but with a large 

23 bias, so I consider all those cases to be 

24 troublesome, all six. 

25 Q. Which of the A, B and C benign tests are not 
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1 being met there? 

2 A. I'm not doing it in a — in a test way. I'm 

3 saying these are indications of problems so if you — 

4 if you look at everything except the first row, all 

5 the variance ratios are bumping around a half, which 

6 I consider troublesome, and the one that's point six 

7 five, where the variance ratio is point six five, the 

8 bias is extremely — exceedingly large, is point 

9 eight eight, so I'm putting those two things together 

10 and saying that's really disturbing. In the first 

11 row, the same sort of thing, the variance ratio isn't 

12 as bad as the other cases, it's point seven one, but 

13 the bias is really quite substantial, so — 

14 Q. The "bias" meaning? 

15 A. The B, the B — 

16 Q. Okay. 

17 A. — is quite a bit bigger than half and the 

18 variance ratio is not below a half, is less than 

19 three quarters, so I'm — I'm worried there. The 

20 second one, the bias is a half consideration, barely 

21 under half, but the variance ratio is a half so I'm 

22 worried there. 

23 Q. Professor Rubin, let me ask you this, then. Are 

24 these conditions under which the situation is benign 

25 that you list, and you list an A, B and C condition, 
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1 is that something different than just combining the 

2 first of your three distributional conditions, being 

3 the means of the propensity scores being — 

4 difference in the means being small and the second 

5 one, the ratio of the variances being close to one? 

6 What I hear you saying is that if you can't meet the 

7 first condition, number one here on page 3 of your 

8 report — 

9 A. Right. 

10 Q. — or if you can't meet the second condition, 

11 number two, on the bottom of page 3 — 

12 A. Right. 

13 Q. — if you miss one of them, either one or two, 

14 then it will never be what you call benign; is that 

15 right? 

16 A. Correct. But I'm saying that in addition to 

17 having — having two, the ratio variances, you need A 

18 and C to be benign, you need — the distributions 

19 have to be nearly symmetric in both groups. 

20 Q. And — 

21 A. And you have to have the — okay. 

22 Q. It's even more restricting than? 

23 (Interruption by the reporter.) 

24 Q. It's an — In order to be benign, it's even more 

25 restrictive than failing to satisfy both one and two 
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1 simultaneously? 

2 A. Exactly. 

3 Q. And the third distributional condition that you 

4 say must be satisfied in order to use the linear 

5 regressions in the way that Professor Zeger, Wyant 

6 and Miller is the ratio of the variance of the 

7 residuals of the original covariates after adjusting 

8 for the propensity score must be close to one; 

9 correct? 

10 A. Correct. 

11 Q. Now is there any publication you can identify 

12 for me that sets forth those three tests? 

13 A. Not stated just that way. I wish that there 

14 were. 

15 Q. Is there any publication that, you know, single 

16 publication that sets forth those three tests in some 

17 other way? 

18 A. Well the — Yes. In fact, the table that I 

19 refer to from this Cochran and Rubin paper in 1973 

20 shows what can happen for a single covariate, for one 

21 variable, when you try to do a linear regression. 

22 Q. This is Table 1? 

23 A. I think so. Yes. 

24 Q. Table 1 on page 10 of your report? 

25 A. Correct, correct. 
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1 Q. Now — 

2 A. This is what happens for covariate, or one X 

3 variable. 

4 Q. Can you explain to me how Table 1 shows that the 

5 difference in the means of the propensity scores must 

6 be less than half a standard deviation apart? 

7 A. Okay. 

8 Q. Here in Table 1. 

9 A. Sure. The propensity score is just a particular 

10 kind of covariate. It's the same thing as if you 

11 made up an index that said you get one point if you 

12 are male, you get five points if you are sick, you 

13 get minus three points if you exercise a lot, make up 

14 an index like that, and that becomes a variable. 

15 Could have done it by putting down numbers like that 

16 and adding them up and you get a score for overall 

17 health, or could have done it analytically, 

18 "analytically" meaning by computing something, and 

19 the propensity score is one variable, a combination 

20 of all the pieces of information that went into it, a 

21 combination of 25 covariates, so that's one variable 

22 now. Propensity score is now one variable and think 

23 of it as one variable. Now go into this table and 

24 say, okay, let's suppose the bias is big on that one 

25 variable and let's say above, like a half or three 
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1 quarters or one, and look to see what happens when 

2 the ratio of variances, that's the left-hand, first 

3 row is 2, where the ratio variance is a half, which 

4 is the bottom. How well does linear regression do at 

5 removing bias? 

6 Q. Let me ask you — 

7 A. It's terrible. 

8 Q. — ask you about this table. 

9 A. Right. 

10 Q. Is what we really should be concerned about is 

11 the figures inside the table or is the B value and 

12 the R values and the sort of axes of the table? 

13 A. Okay. What we can — we can calculate from our 

14 data are the B values and R values. We see those. 

15 So for example the propensity score is one variable, 

16 we can see what B is in the different — In Table 2 I 

17 showed you various values of B in different cells and 

18 what the R value is, so these values we know from 

19 looking at the analysis of the data we have got. 

20 What you don't know is which of these functions, the 

21 Y, moderate, nonlinear, moderate, marked, we don't 

22 know what God is doing to us there. Only God knows 

23 that. 

24 Q. That's to describe the true relationship in the 

25 world? 
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1 A. Right, between Y, the outcome, expenditures or 

2 — this is for continuous outcomes so it relates to 

3 log expenditures. Only God knows that. We don't 

4 know what it is. Moderate and marked refer to the 

5 fact that how much curvature it has relative to the 

6 straight line, and those are Cochran's words on 

7 moderate and marked, and so let's see what happens 

8 when we try to do linear regression adjustments where 

9 we may know where we are with respect to R and B but 

10 we don't know which of these four columns we are 

11 living with. 

12 Q. Right. 

13 A. Okay? So let's take — if we look at — if we 

14 know R is one and B is a quarter, well, gee, linear 

15 regression does very well. It's just about a hundred 

16 percent, maybe 1 percent overadjustment, slight. How 

17 about if R is one and B is a half? Gets to 2 

18 percent. Now it's 2 percent of a bigger starting 

19 amount because B as a half is a bigger bias so it has 

20 a bigger bias than it did in case one but it's still 

21 not bad. When B is three quarters and the variance 

22 ratio is one, R is one, well 4 percent overcorrection 

23 for these conditions, but if you get 4 percent of a 

24 bigger bias to start with, that's good. In our case 

25 it would only be a few hundred million dollars, 
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1 perhaps. B is one, then it still is sort of okay, 

2 although it doesn't get all of it, overcorrect. 

3 There is an 8 percent problem there. And we don't 

4 know which of those answers is right, whether it's a 

5 hundred percent or 101 percent, 102 percent or 108 

6 percent, because all we know, we are living in the B 

7 equals, let's say, three quarters and the R equals 

8 one, so that's what that would say. 

9 Are we ever there with these data? Well let's 

10 look another Table 2. Where are we? We are living 

11 in a world where R is about a half, sometimes a 

12 fifth, sometimes three quarters and B is over a 

13 half. So where are we living? We are living in the 

14 bottom row. And where are we living in the bottom 

15 row? We are living at about the B equals three 

16 quarters and R equals a half, or maybe B equals a 

17 half and R equals a half. That's where we are living 

18 when we look at these values. We are living in this 

19 region. And you look in that region and you see what 

20 is the linear regression adjustment doing. In some 

21 cases, to use Cochran's words, it's widely erratic. 

22 Sometimes it overcorrects, sometimes it 

23 undercorrects. It can't be relied on and is varying 

24 with something that they are never worrying about, 

25 these modest — moderate nonlinears or marked 
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1 nonlinears which they never look to see whether they 

2 were there. 

3 Q. All right. Just so I understand the bench marks 

4 that you have set forth here — 

5 A. Right. 

6 Q. — taking a simpler example, if we just had a 

7 case where R equaled one and B equaled three 

8 quarters, under two of your assumptions about how the 

9 world really works the linear regression could 

10 produce pretty good results; isn't that right? 

11 A. Pretty good, although 4 percent — 

12 Q. Well under the moderate it's only one percent; 

13 right? 

14 A. One percent would be pretty good and 4 percent 

15 would be okay but there is a bigger — when B equals 

16 three quarters there is a bigger initial bias to 

17 start with, so it's 4 percent of a bigger number than 

18 lower down in the table. 

19 Q. And you already told us that. 

20 A. Okay. 

21 Q. If R equals one and B equals one and you are 

22 under one of these moderate nonlinearity situations, 

23 it's only a 2 percent, again it's 2 percent of a 

24 bigger number you will tell me but it's still not 

25 terrible? 
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1 A. That's right. It's the linear adjustment is 

2 trying to do the right thing and comes pretty close, 

3 and so I would consider such a situation relatively 

4 benign. There is an implicit assumption here, by the 

5 way, that the number of observations in both groups 

6 is the same. 

7 Q. Okay. The — you have not — You told us about 

8 these sort of four functions that could possibly try 

9 to describe the real world. 

10 A. Right. 

11 Q. And do you have any opinion as to which of those 

12 four functions, if any, fairly describes the 

13 relationship between smoking and health or smoking 

14 and healthcare costs? 

15 A. Absolutely no idea. 

16 Q. Does it have to be one of those four? 

17 A. Absolutely not. Could be much worse. 

18 Q. Could be something entirely different? 

19 A. Yeah, could be something entirely different. 

20 Q. And if it's something entirely difference, we 

21 don't know how much overcorrection or undercorrection 

22 has taken place, do we? 

23 A. Correct, that's right. Pretty good reason to 

24 see what condition you are in. 

25 Q. Well you can look and see what condition you are 
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1 in but if the real world is not anything of these 

2 things, what do you know from this table about what 

3 any model is doing? 

4 A. Okay. What you know, these are trying to be 

5 representative of moderate and marked and 

6 nonlinearity, of lack of model fit. So if you are in 

7 a case the R equals one and B equals a quarter, B 

8 equals a half, you can feel pretty comfortable about 

9 using linear regression unless the truth is really 

10 way off linear, and depending upon the kind of data 

11 you have, you may be able to actually see the way off 

12 if you look at it, which would be a wise thing to 

13 do. However if you are in the let's say R equals two 

14 equals, B equals a quarter world, you are being told 

15 there you are really in trouble unless you really can 

16 tell me that you think the world is linear, pretty 

17 close to being linear so that you are down to 

18 moderate. And even if you are down in the moderate, 

19 you can still be in deep trouble. So, you better 

20 look very carefully and you better not rely on the 

21 linear regression adjustment to do the adjustment for 

22 you. It could go either way. 

23 Q. If you have a situation where R equals a half 

24 and B equals one, you could have adjustments that are 

25 only 2 percent too much or 4 percent too little or 
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1 you could have adjustments that are 40 percent too 

2 much or — 

3 A. Correct. 

4 Q. — 13 percent too much? 

5 A. And these are in a situation, in that case R 

6 equals a half and B equals one. I know I'm repeating 

7 myself but I want to emphasize that the initial bias 

8 is large, so 40 percent of the large numbers is 

9 large. 

10 Q. But we might be in the 2 percent plus or 4 

11 percent minus category, too, or some other — 

12 A. Could be worse. 

13 Q. Could be worse or could be better than that, 

14 could be 1 percent depending on the what the real 

15 world — 

16 A. If the real world were linear, you took it all 

17 up. 

18 Q. What is it — Do you have an opinion as to 

19 whether the real world is linear or not with respect 

20 relationship between smoking and health or smoking 

21 and healthcare costs? 

22 A. We know all models are wrong so it's not 

23 perfectly linear. The logit specifications, the 

24 probit specifications have linearity built into them 

25 that we know are wrong. We know that the log 
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1 linearity of the expense fund is wrong. How wrong 

2 and whether it makes a difference, we don't know, but 

3 we don't want to rely on the linearity assumptions to 

4 do these kinds of adjustments. We want to do 

5 something that's more robust to the specification of 

6 the underlying function. A good method would be, 

7 try to clarify, would be if we did the same table, we 

8 had percent reductions bias using good method. What 

9 this would show in contrast would be throughout the 

10 table there would be numbers between 90 and 110 

11 percent. That would be a good method. 

12 Q. So 90 to a 110 would be a good — 

13 A. And then even better would be now supplement it 

14 with regression adjustments across all versions of 

15 the table, and that's a paper actually wrote in '79 

16 on match sampling which is referred to in doing the 

17 Danish study, and that's why we did it that way, 

18 because that's what tends to come out if you do it 

19 well, if you do the method, the same sort of setup as 

20 this. You find numbers that are — give you some 

21 faith that what you are doing is not completely 

22 dependent upon something you have no idea about and 

23 you haven't even looked at. That's what a good 

24 method would be. This is not a good method, 

25 especially when you look at the data and you know you 
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1 are in the R equals a half, B equals a quarter, B 

2 equals a half, B equals three quarters cells, it 

3 just — 

4 Q. But since we don't know what the real-world 

5 distribution is, we don't really know what the 

6 percentage of overcorrection or undercorrection would 

7 be; right? 

8 A. That's correct. But we do know what we have 

9 done cannot be trusted, it's completely unreliable. 

10 It is making up numbers. 

11 Q. Well if the model is not right, it can't be 

12 trusted. If the model turns out to be right and 

13 actually reflects the real world quite reliably, you 

14 are going to get good results? 

15 A. If the world is exactly linear, that's right. 

16 Tautologically, if the model is exactly right and you 

17 apply the model correctly, you will get an answer 

18 that agrees with the model, not the way they did it. 

19 They are missing data. This is a model missing data 

20 and other things. 

21 Q. Now we talked earlier today about the two-way 

22 interaction part of the Table 4. 

23 A. Correct. 

24 Q. And the step-wise interactions that was Table 3. 

25 A. Correct. 
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1 Q. Can you give me any references to literature 

2 that say that those calculations need to be performed 

3 before you use linear regression in the way that Drs. 

4 Zeger, Wyant and Miller did? 

5 A. Step-wise in all two-way interactions? 

6 Q. Right. 

7 A. No, I don't think I can say that has to be 

8 done. What has to be done, as I think I said before, 

9 you should decide a priori what the right — strike 

10 "right" — what the collection of background 

11 variables you want to adjust for and then decide 

12 which interactions among them you think are important 

13 and put them in and then try to see how far apart 

14 these distributions are having put all that in, so 

15 they decided, "they" being the Zeger team, decided 

16 the main effects model was adequate, so that's the 

17 fundamental place to start. And these other tables 

18 are just showing how the variance ratios, for 

19 example, orthogonal to the propensity score, can get 

20 extreme if somebody wanted to through two-way 

21 interactions and even — or by step-wise kinds of 

22 things by including two-way interactions. I would 

23 not say that's something that one should always do, 

24 no. 

25 Q. Now do you review articles that are submitted to 
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1 various peer-review journals? 

2 A. Yes. 

3 Q. Have you ever reviewed an article written by 

4 someone other than one of your students, anyway, 

5 where the person uses linear regression and they have 

6 explicitly set forth their comparison of the 

7 distributions of the two groups and seen that it met 

8 the test, the bench marks that you have set forth and 

9 have gone through and done not only a main effects 

10 situation but an interaction effects situation as 

11 well? 

12 A. So you are setting it up so the person has to 

13 have used propensity scores and done it the way I did 

14 it here, but I said I wouldn't even recommend doing 

15 the interaction step-wise. Is that really the 

16 question, though? 

17 Q. That's not exactly step-wise but using 

18 propensity scores, not just in a main effects 

19 situation but also some kind of step-wise or 

20 interaction, whether it's two way or three way or 

21 certain variables are two way and other variables 

22 three way, whether it explicitly sets forth here is 

23 what I've done to be sure that I can use linear 

24 regression here. 

25 MR. BIERSTEKER: Objection, I think that 
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1 mischaracterizes the testimony. 

2 A. Yeah, I think it does as well. You are saying 

3 have I ever seen somebody do something that I 

4 wouldn't recommend that they do. I've — As I've 

5 said, the reason I was doing that is that here are 

6 the collection of variables that Zeger et al think 

7 should be adjusted for. Often there is the 

8 implication in people's minds that linear regression, 

9 just putting in main effects, adjusts for everything 

10 in a cell-by-cell basis the way you described it 

11 earlier, and this was just a way of illustrating the 

12 first step. If you really wanted to think it was 

13 doing that, you would have to put all these 

14 interactions in all the way up to the 25th order, 

15 which we didn't do. Again the right way to do it is 

16 to decide a priori — 

17 Q. Which interactions should be looked at? 

18 A. Yes. 

19 Q. My question is: Have you reviewed an article 

20 where the author has gone through and used linear 

21 regression and sets forth in his article that before 

22 going forward and using linear regression here I've 

23 looked at my two groups and I've done propensity 

24 scores, or something at least equivalent to the 

25 propensity scores on them, not in just a main effects 
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1 way but I've looked at a way in which I consider to 

2 be the important interaction effects? 

3 A. Then the important interaction effects would 

4 have been in the original model if they thought they 

5 were important, if the author thought they were 

6 important. 

IQ. So they wouldn't need — 

8 You are saying there wouldn't be a need to do a 

9 second analysis? 

10 A. No. 

11 Q. The main effects would include interactions 

12 because there would be interaction variables in the 

13 model? 

14 A. Correct. 

15 Q. All right. Have you reviewed an article where 

16 the author has done that, they used linear regression 

17 and they set forth a — something similar to the 

18 propensity score analysis where they have shown what 

19 the B values and the R values are and for the 

20 variables, including interaction variables that they 

21 actually used in their model? 

22 A. I have seen work that does that. Have I — Can 

23 I refer to a specific article where that's actually 

24 part of the article? Quite possibly not. It may be 

25 somewhere, it may not be. The reason I'm saying that 
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1 is, that's part of what you should do when doing the 

2 initial analysis of a data set, to see how you should 

3 analyze that data set. 

4 When you publish articles, almost always the 

5 editors are pressing you to cut and reduce pages, and 

6 so background work that confirms the analysis you are 

7 doing is reasonably is often asked, don't do it, just 

8 say you did it. Do I know of any articles where they 

9 said they did it? Nothing comes to mind right away 

10 but I've seen articles in process where they do that 

11 kind of work to see how much overlap there is. 

12 Q. If you were looking for articles that actually 

13 explicitly talked about doing propensity score 

14 testing before doing linear regression, is — I think 

15 there is something called the current index to 

16 statistics that the Journal of American Statistical 

17 Association publishes. Is that a place where 

18 statisticians would go to look for something like 

19 that? 

20 A. Correct, that's one place. You can also look in 

21 the article I wrote for Annals of Internal Medicine 

22 that was on propensity score methods, and I have a 

23 list of publications in there that are in medical 

24 journals primarily that used propensity score methods 

25 and I think more or less advocate them, I think some 
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1 even provide software for them that have been written 

2 from the mid-'80s until now. I think the list of 

3 references may be 20 or 30 references. 

4 Q. Do you know if the medical journals, people are 

5 using propensity scores in the articles, if that 

6 would show up in what's called a Medline search? 

7 A. I've heard the Medline search and I think they 

8 probably would, so you can — it certainly would be a 

9 straightforward thing to check. Let me clarify 

10 that. Just because it doesn't — If an article did 

11 not use propensity score as a keyword, and you might 

12 not find it. It doesn't mean — a double negative is 

13 going to be coming here — it doesn't mean the 

14 authors aren't doing something intelligent to check 

15 the distribution of background variables and see how 

16 much overlap there is. 

17 Q. I understand you are saying there is other ways 

18 of checking the background distributions other than 

19 using something that actually called propensity 

20 scores. 

21 A. Correct. 

22 Q. Have you reviewed articles before they are 

23 published where the authors are using linear 

24 regression of some type and they haven't said in the 

25 article or anything that they have told you about 
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1 whether they have done anything like propensity 

2 scoring, whether it is actually called propensity 

3 scoring, to check the distribution of the background 

4 variables and where you — have you questioned that 

5 author and said have you done this sort of analysis? 

6 A. I probably have, yes. Let me — let me be clear 

7 at the moment about the kind of articles that I do 

8 review as a referee for publications you are talking 

9 about now. 

10 Q. Sure. 

11 A. I primarily review articles that are regarded as 

12 needing my kind of review in the sense of relatively 

13 sophisticated statistically. The kinds of articles 

14 that are just straightforward applications of methods 

15 tend not to get sent to me anymore, and when I do get 

16 sent them, for a few years I've passed them on to 

17 students and other colleagues. 

18 Q. Was there a time five or 10 years ago when you 

19 got more of those straightforward articles to review? 

20 A. Yes. 

21 Q. Going back to that time, did you receive such 

22 articles where authors using a regression don't set 

23 forth any explicit comparison of the background 

24 variable distributions, have you then required them 

25 if they want to get your approval and recommendation 

STIREWALT & ASSOCIATES 

P.O. BOX 18188, MINNEAPOLIS, MN 55418 1-800-553-1953 


http://legacy.library.ucsf Sdur'tiel/ijtip§afQ<W|Offllfindustrydocuments.ucsf.edu/docs/fkhd0001 



722 


1 to go back and do such work? 

2 A. I don't have a specific recollection of one. I 

3 have vague recollection of several times I would make 

4 a — my report back to the associate editor and 

5 saying it seems to me this is — this is a question 

6 that's an important question this person is 

7 addressing, that this comparison of the background 

8 variables between these two groups, see what the — 

9 how the distributions compare, where they should be 

10 done, but it's up to the associate editor to make 

11 that kind of call. I would have made that kind of 

12 recommendation. I think it's essential. 

13 Q. Do you know whether in any of those situations 

14 the article was published even though they never went 

15 back and did that sort of analysis? 

16 A. I don't know but it wouldn't surprise me. 

17 Q. That it was publish? 

18 A. Yeah, articles like that are published and very 

19 often they are — there are different kinds of 

20 criticisms, and the author kind of writes around them 

21 sometimes, maybe the author said he really did do 

22 that or maybe the author writes back and said I 

23 didn't do that but this is an important topic and I'm 

24 just following this other article you already 

25 published, why should I have to be held to a higher 
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1 standard? There are all kinds of reasons why things 

2 get published, as I'm sure you are aware. 

3 Q. In peer review journals you serve as a reviewer 

4 on? 

5 A. In all peer review journals. There is no 

6 perfect article so there is always trade-offs that 

7 people make. 

8 Q. If you turn to page 7 of your supplemental 

9 report. In the second paragraph there, you are 

10 looking at a subgroup composed of males 19 to 34 

11 years old; is that correct? 

12 A. Correct. 

13 Q. In the last sentence of that paragraph, you say, 

14 "The statistical guidelines indicated in Table 1 

15 mean that, in this situation, Zeger's regressions 

16 could grossly overcorrect or grossly undercorrect for 

17 bias." Did I read that correctly? 

18 A. Yes. 

19 Q. Now Professor Rubin, does Zeger's regression 

20 actually overcorrect for bias in that group? 

21 A. Well we don't know, do we? We know what cell we 

22 are in in this table and since no one has ever looked 

23 to see at all what kind of relationships these 

24 outcome variables have to the covariates, we just 

25 don't know. 
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1 Q. And — 

2 A. We know the answer that's produced can't be 

3 trusted, that much we know. 

4 Q. Do you know if in that group the Zeger 

5 regressions grossly undercorrect for bias? 

6 MR. BIERSTEKER: Objection, asked and 

7 answered. 

8 A. Yeah. What we know is the Zeger regressions 

9 cannot be trusted, they are completely unreliable. 

10 Q. My question is: Do you know whether the Zeger 

11 regressions grossly undercorrect the bias? 

12 MR. BIERSTEKER: Asked and answered. 

13 A. We do not know what they are doing. They are 

14 basically tossing up coins and making answers. 

15 Q. Can you give a probability that the Zeger model 

16 actually grossly overcorrects for bias in that 

17 situation? 

18 A. Can I give a probability? This is kind of — 

19 it's not a statistical. Is this kind — you mean — 

20 Q. I assume you haven't done any kind of 

21 calculation to give a probability; correct? 

22 A. Correct, I haven't done any kind of probability 

23 because in fact in order to even try to understand 

24 whether the adjustment is doing anything sensible, 

25 you would have to look at the outcome variables, the 
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1 Ys he is predicting from the Xs, and try to see what 

2 kind of relationship they do have to the background 

3 variables, the covariates, and the way they are being 

4 estimated, they are very — they tend to be very 

5 noisy and so it would take care and I haven't done 

6 it. These analyses don't look at outcomes at all, 

7 they only look to background variables. 

8 Q. So I take it you can't give me any sort of 

9 probability that it grossly overcorrects or grossly 

10 undercorrects in the 19 to 34 situation? 

11 A. One hundred percent the way you record worded 

12 it. 

13 Q. One or the other. 

14 A. What I'm saying by that, I was being somewhat 

15 facetious, when you put the "or" in there, can you 

16 give me a probability that undercorrects or 

17 overcorrects, is the way I heard it — maybe that 

18 wasn't what was said — which means the probability 

19 is certain it does one or the other. 

20 Q. And that's true for just about anything? 

21 A. Everything where — 

22 Q. Any time a linear regression, it is almost 

23 certain it does some overcorrecting or some 

24 undercorrecting? 

25 A. Some little bit, yes. 
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1 Q. Some little bit? 

2 A. Not like this. 

3 Q. My question really was, I meant to ask you, can 

4 you give me any probability that the Zeger model 

5 actually grossly overcorrects for bias in these 

6 males-19-to-34 situation? 

7 A. I have no way. 

8 Q. And the same for grossly undercorrect? 

9 A. I have no idea. It's almost as if they didn't 

10 do the analysis. 

11 Q. If you look at Display 1 on page 14 of your 

12 report. The first part lists the variables used in 

13 the main effects model. 

14 A. Right. 

15 Q. And I believe in your report you said there were 

16 25 variables. 

17 A. I think, yes, that's what I said. 

18 Q. I want to make sure I understand, when I count 

19 the variables listed in parentheses I get 24, and 

20 would age then be the 25th variable? Is that the way 

21 to count the variables? 

22 A. Yes, yes, that's correct. 

23 Q. Have you made any examination of how those 

24 variables individually interact with smoking or with 

25 healthcare? 
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1 A. As I said, I had not looked at the relationship, 

2 I have not looked at the relationship between these 

3 variables and any outcomes. 

4 Q. Has anyone working for you or with you done 

5 that? 

6 A. No. 

7 (Plaintiffs' Deposition Exhibit 3560 was 

8 marked for identification.) 

9 BY MR. LOVE: 

10 Q. Professor Rubin, I'll show you what we have 

11 marked as Exhibit 3560. It's the supplemental report 

12 of William E. Wecker dated January 15, 1998. 

13 A. Yes. 

14 Q. You told me you had looked at that for the first 

15 time over the weekend in preparation for your 

16 deposition today. 

17 A. Correct. 

18 Q. And in reviewing that, did you reach any 

19 opinions about that report? 

20 A. I — My one opinion was that it was fairly 

21 clearly written, although rather sketchy in details. 

22 I also thought that the report appeared to make some 

23 — some good points. 

24 Q. Which were those? 

25 A. The main point, the wrong question, has to do 
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1 with the alleged misconduct, the same point I've been 

2 making, so therefore, tautologically, it's good; 

3 right? 

4 Q. Yes. 

5 A. According to me, otherwise it would be 

6 consistent. Studies the wrong population. I found 

7 that interesting, the idea that in NMES that there 

8 are people who are included there who are really not 

9 like the public aid population, and I haven't looked 

10 at that myself independently but I think it's an 

11 important criticism. I think I referred to that 

12 criticism earlier this morning when I was talking 

13 about what Harrison did in Oklahoma. I believe he 

14 used a different — different definition than NMES. 

15 Q. In your opinion, is it better to look at just 

16 the public aid population in NMES when you are 

17 estimating smoking-attributable expenditures for the 

18 Medicaid population of Minnesota? 

19 A. For public aid people. In principle, yes. 

20 There is an issue of do the data become too thin and 

21 do you want to borrow strength from marginally 

22 relevant but not the same kind of people, and they 

23 haven't looked carefully enough to see that. But 

24 certainly it's better in principle to use just those 

25 subjects who are as close as possible to the subjects 
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1 you want to reference, so in principle that's right. 

2 Q. Have you seen any of the computer data that goes 

3 along with Dr. Wecker's supplemental report? 

4 A. No, I have not. 

5 Q. Did you find any other points that you agreed 

6 with in Dr. Wecker's report? 

7 A. I — I found the points of inconsistency 

8 relevant in the sense that the points pre — 

9 inconsistent predictions across expense categories, 

10 across age and the results from the expenses from — 

11 from males, what they point out is — are things that 

12 give pause to the belief in the model. They don't 

13 make a lot of sense. So, they generate doubts about 

14 the model. The comments about — I guess it's at the 

15 end here. Oh, the omitted information, I regard that 

16 as — we already talked about this. Since I regard 

17 these kinds of analyses for smoking-attributable 

18 fractions and smoking-attributable expenses as 

19 descriptive in attempting to compare like with like 

20 as much as possible and the fact other people do 

21 include more Xs, more background variables in their 

22 models, then it certainly is true that you would like 

23 to include more information to make the comparisons 

24 of smokers and nonsmokers more comparable and more 

25 specific to a type — particular type of person, so 
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1 omitted-information criticism I think is a realistic 

2 criticism given the objectives, to compare like with 

3 like, accounting for such background characteristic. 

4 These other facts he has about the 

5 inconsistencies from injuries and mental disorders 

6 gives pause to thinking about the kinds of analyses 

7 they are — they are doing. 

8 Q. Is that something, is that a way you have used 

9 to test models in the past, take something that was 

10 designed to measure smoking-attributable healthcare 

11 costs in a population and then see what it would 

12 calculate if you looked at just certain kinds of — a 

13 subset of everything that you initially wanted to 

14 look at? 

15 A. Well it's a — it's a reality check and I think 

16 that the general advice to do so goes back for many, 

17 many years, to I think even work that was done on 

18 smoking a quarter century ago. When you try to get 

19 causal inferences from observational studies, one of 

20 the pieces of advice Cochran would talk about and I 

21 think he attributed to Sir Richard Dahl and Fisher, 

22 was because you don't have a randomized experiment, 

23 you try to make your causal hypothesis complex. This 

24 is very old advice. So that you — What you should 

25 find is that if there really is a causal link, it 
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1 should obey certain kinds of rational kinds of 

2 arguments. For example if smoking is bad for you — 

3 I think this is a specific example used; I don't have 

4 the reference. But if smoking is bad for you in 

5 terms of lung cancer, then you should find people 

6 that smoke two packs a day have more lung cancer than 

7 people who smoke one pack a day, adjusting for other 

8 background characteristics. 

9 Q. So the dose response situation? 

10 A. Yeah, exactly right. Three packs a day should 

11 be worse than two packs a day. 

12 Q. But this isn't a dose response that he is doing 

13 here. 

14 A. No. I'll give you an analogy that came up in 

15 another study that I was involved in, private versus 

16 public schools, which are more effective at improving 

17 education, the scores, achievement of kids, and if 

18 you find out that the private schools, let's say, 

19 have a special math program. I'm making this up. 

20 The example isn't quite — it's 15 years or more. If 

21 the private schools you are looking at have a special 

22 math program and then you do an analysis to find out 

23 whether the private schools do better than public 

24 schools at educating kids, adjusting for background 

25 variables which would include income of parents, 
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1 education of parents, the age of the kids, incoming 

2 scores of the kids, that sort of stuff, and you look 

3 at the outcome variable math scores and you look at 

4 the outcome variable reading test scores, and if the 

5 public and private schools have the exact same 

6 reading program but they have a very different math 

7 program and then you find that this analysis shows 

8 there is no effect of private versus public on math 

9 scores but a big effect on reading scores, it doesn't 

10 hold together very well. The theory doesn't — the 

11 theory that the schools are doing something new in 

12 math to improve performance doesn't hold together 

13 because it should be affecting math scores, not 

14 reading scores, so you don't believe the analysis. 

15 Q. You think that's an analogy to what Dr. Wecker 

16 is describing — 

17 A. Yes, I think it really is. 

18 Q. — at the bottom of page 5 and top of page 6. 

19 A. It has that flavor, that's right. It has the 

20 flavor I will now use the same technology I used to 

21 estimate the effect of private versus public on math 

22 scores. I'll use it to look at reading scores. I 

23 might see just the same effect in reading scores when 

24 they have no special program in reading. That gives 

25 me pause to think I'm estimating the effect of the 
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1 new math program. And so what I — what I — it's 

2 very — it's very cursory, as you know, here, but 

3 what I — reading this, basically he is doing 

4 something like that. He is — he is doing an 

5 analysis that is smoking, you wouldn't think, would 

6 have. Do you see the analogy I'm trying to make 

7 here? The — the lung cancer you see is like the 

8 math program that the new school has and the — these 

9 conditions that aren't related to smoking, 

10 supposedly, poisoning, injuries, mental disorders, 

11 automobile accidents, are ones like the reading 

12 tests. Schools have no difference in programs in 

13 reading so why would you find a big effect for the 

14 public versus private schools on reading. 

15 That's my analogy. Again, this is — this is 

16 two or three sentences. I don't know exactly what 

17 was done but it has the flavor of that to me, so I 

18 think it agrees with the old advice that for more 

19 than a quarter century ago was used to actually look 

20 at these observational studies and smoking and cancer 

21 and try to draw some reasonable conclusions about 

22 directions. The results kind of hang together in 

23 some sense. 

24 Q. If you look at the very next line, 

25 "'Attributable' Expenses Overstates Actual Expenses" 
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1 on page 6. 

2 A. Uh-huh. 

3 Q. Did you come to any conclusions about whether 

4 that presents a problem that the attributable 

5 expenses due to six factors actually turns out to be 

6 more than the total expenses, or is that something 

7 you would expect to see in statistics if you use — 

8 calculate attributable expenses to a whole variety of 

9 factors, that eventually you will get more than a 

10 hundred percent? 

11 A. Sure. This is a criticism, I think, of thinking 

12 of these descriptive quantities as having something 

13 to do with causality, that you are sort of — you are 

14 doing descriptive things and each of these are 

15 describing something in the population, assuming that 

16 you did the adjustment right, which I don't believe 

17 at all, the models do the adjustment right. What 

18 it's doing is it's sort of like that other checking 

19 point. You apply the same technology to something 

20 else and you get an answer that hangs together with 

21 the other answers if you are going to think of it as 

22 being a causal link. You sort of don't. But I have 

23 to see much more than this to really understand 

24 what's going on there, but the — I think these 

25 things are reality checks and they are, I think, good 
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1 points to the extent that I understand them from this 

2 very brief description. I think they are good 

3 criticisms in the same — this very old sense of if 

4 you are trying to do something that's causal that you 

5 hang together, that the conclusion should hang 

6 together and make a consistent story, and these do 

7 not seem to do that very well. 

8 Q. You don't believe that if you looked at any 

9 factors you can think of and ask whether those 

10 attributable — whether costs incurred by healthcare 

11 program were attributable to age, sex, hair color, 

12 everything, if all the ones you took were positively 

13 associated with costs and you took 10 or 20 of them 

14 and they were fairly major ones, don't you expect 

15 that to be more than a hundred percent, since you 

16 didn't take any negative ones? 

17 A. Yeah, typically I think you find to do that, 

18 that's right. 

19 (Discussion off the record.) 

20 (Deposition concluded at approximately 

21 5:33 o'clock p.m.) 

22 

23 

24 

25 
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